• Title/Summary/Keyword: Organization Intelligence

Search Result 259, Processing Time 0.026 seconds

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

  • Nam, Kihwan;Seong, Nohyoon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.33-49
    • /
    • 2018
  • The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.

ICT Company Profiling Analysis and the Mechanism for Performance Creation Depending on the Type of Government Start-up Support Program (정부창업지원 프로그램 참여에 따른 ICT 기업 프로파일링과 성과창출 메커니즘)

  • Ha, Sangjip;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.237-258
    • /
    • 2022
  • As the global market environment changes, the domestic ICT industry has a growing influence on the world economy. This industry is regarded as an important driving force in the national economy from a technological and social point of view. In particular, small and medium-sized enterprises (SMEs) in the ICT industry are regarded as essential actors of domestic economic development in terms of company diversity, technology development and job creation. However, since it is small compared to large-sized enterprises, it is difficult for SMEs to survive with a differentiated strategy in an incomplete and rapidly changing environment. Therefore, SMEs must make a lot of efforts to improve their own capabilities, and the government needs to provide the desirable help suitable for corporate internal resources so that they can continue to be competitive. This study classifies the types of ICT SMEs participating in government support programs, and analyzes the relationship between resources and performance creation of each type. The data from the "ICT Small and Medium Enterprises Survey" conducted annually by the Ministry of Science and ICT was used. In the first stage, ICT SMEs were clustered based on common factors according to their experiences with government support programs. Three clusters were meaningfully classified, and each cluster was named "active participation type," "initial support type," and "soloist type." As a second step, this study compared the characteristics of each cluster through profiling analysis for each cluster. The third step carried out in this study was to find out the mechanism of R&D performance creation for each cluster through regression analysis. Different factors affected performance creation for each cluster, and the magnitude of the influence was also different. Specifically, for "active participation type", "current manpower", "technology competitiveness", and "R&D investment in the previous year" were found to be important factors in creating R&D performance. "Initial support type" was identified as "whether or not a dedicated R&D organization exists", "R&D investment amount in the previous year", "Ratio of sales to large companies", and "Ratio of vendors supplied to large companies" contributed to the performance. Lastly, in the case of "soloist type", "current workforce" and "future recruitment plan", "technological competitiveness", "R&D investment", "large company sales ratio", and "overseas sales ratio" showed a significant relationship with the performance. This study has practical implications of showing what strategy should be established when supporting SMEs in the future according to the government's participation in the startup program and providing a guide on what kind of support should be provided.

Current and Future Perspectives of Lung Organoid and Lung-on-chip in Biomedical and Pharmaceutical Applications

  • Junhyoung Lee;Jimin Park;Sanghun Kim;Esther Han;Sungho Maeng;Jiyou Han
    • Journal of Life Science
    • /
    • v.34 no.5
    • /
    • pp.339-355
    • /
    • 2024
  • The pulmonary system is a highly complex system that can only be understood by integrating its functional and structural aspects. Hence, in vivo animal models are generally used for pathological studies of pulmonary diseases and the evaluation of inhalation toxicity. However, to reduce the number of animals used in experimentation and with the consideration of animal welfare, alternative methods have been extensively developed. Notably, the Organization for Economic Co-operation and Development (OECD) and the United States Environmental Protection Agency (USEPA) have agreed to prohibit animal testing after 2030. Therefore, the latest advances in biotechnology are revolutionizing the approach to developing in vitro inhalation models. For example, lung organ-on-a-chip (OoC) and organoid models have been intensively studied alongside advancements in three-dimensional (3D) bioprinting and microfluidic systems. These modeling systems can more precisely imitate the complex biological environment compared to traditional in vivo animal experiments. This review paper addresses multiple aspects of the recent in vitro modeling systems of lung OoC and organoids. It includes discussions on the use of endothelial cells, epithelial cells, and fibroblasts composed of lung alveoli generated from pluripotent stem cells or cancer cells. Moreover, it covers lung air-liquid interface (ALI) systems, transwell membrane materials, and in silico models using artificial intelligence (AI) for the establishment and evaluation of in vitro pulmonary systems.

Application of Deep Learning for Classification of Ancient Korean Roof-end Tile Images (딥러닝을 활용한 고대 수막새 이미지 분류 검토)

  • KIM Younghyun
    • Korean Journal of Heritage: History & Science
    • /
    • v.57 no.3
    • /
    • pp.24-35
    • /
    • 2024
  • Recently, research using deep learning technologies such as artificial intelligence, convolutional neural networks, etc. has been actively conducted in various fields including healthcare, manufacturing, autonomous driving, and security, and is having a significant influence on society. In line with this trend, the present study attempted to apply deep learning to the classification of archaeological artifacts, specifically ancient Korean roof-end tiles. Using 100 images of roof-end tiles from each of the Goguryeo, Baekje, and Silla dynasties, for a total of 300 base images, a dataset was formed and expanded to 1,200 images using data augmentation techniques. After building a model using transfer learning from the pre-trained EfficientNetB0 model and conducting five-fold cross-validation, an average training accuracy of 98.06% and validation accuracy of 97.08% were achieved. Furthermore, when model performance was evaluated with a test dataset of 240 images, it could classify the roof-end tile images from the three dynasties with a minimum accuracy of 91%. In particular, with a learning rate of 0.0001, the model exhibited the highest performance, with accuracy of 92.92%, precision of 92.96%, recall of 92.92%, and F1 score of 92.93%. This optimal result was obtained by preventing overfitting and underfitting issues using various learning rate settings and finding the optimal hyperparameters. The study's findings confirm the potential for applying deep learning technologies to the classification of Korean archaeological materials, which is significant. Additionally, it was confirmed that the existing ImageNet dataset and parameters could be positively applied to the analysis of archaeological data. This approach could lead to the creation of various models for future archaeological database accumulation, the use of artifacts in museums, and classification and organization of artifacts.

Empirical Analysis on Bitcoin Price Change by Consumer, Industry and Macro-Economy Variables (비트코인 가격 변화에 관한 실증분석: 소비자, 산업, 그리고 거시변수를 중심으로)

  • Lee, Junsik;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.195-220
    • /
    • 2018
  • In this study, we conducted an empirical analysis of the factors that affect the change of Bitcoin Closing Price. Previous studies have focused on the security of the block chain system, the economic ripple effects caused by the cryptocurrency, legal implications and the acceptance to consumer about cryptocurrency. In various area, cryptocurrency was studied and many researcher and people including government, regardless of country, try to utilize cryptocurrency and applicate to its technology. Despite of rapid and dramatic change of cryptocurrencies' price and growth of its effects, empirical study of the factors affecting the price change of cryptocurrency was lack. There were only a few limited studies, business reports and short working paper. Therefore, it is necessary to determine what factors effect on the change of closing Bitcoin price. For analysis, hypotheses were constructed from three dimensions of consumer, industry, and macroeconomics for analysis, and time series data were collected for variables of each dimension. Consumer variables consist of search traffic of Bitcoin, search traffic of bitcoin ban, search traffic of ransomware and search traffic of war. Industry variables were composed GPU vendors' stock price and memory vendors' stock price. Macro-economy variables were contemplated such as U.S. dollar index futures, FOMC policy interest rates, WTI crude oil price. Using above variables, we did times series regression analysis to find relationship between those variables and change of Bitcoin Closing Price. Before the regression analysis to confirm the relationship between change of Bitcoin Closing Price and the other variables, we performed the Unit-root test to verifying the stationary of time series data to avoid spurious regression. Then, using a stationary data, we did the regression analysis. As a result of the analysis, we found that the change of Bitcoin Closing Price has negative effects with search traffic of 'Bitcoin Ban' and US dollar index futures, while change of GPU vendors' stock price and change of WTI crude oil price showed positive effects. In case of 'Bitcoin Ban', it is directly determining the maintenance or abolition of Bitcoin trade, that's why consumer reacted sensitively and effected on change of Bitcoin Closing Price. GPU is raw material of Bitcoin mining. Generally, increasing of companies' stock price means the growth of the sales of those companies' products and services. GPU's demands increases are indirectly reflected to the GPU vendors' stock price. Making an interpretation, a rise in prices of GPU has put a crimp on the mining of Bitcoin. Consequently, GPU vendors' stock price effects on change of Bitcoin Closing Price. And we confirmed U.S. dollar index futures moved in the opposite direction with change of Bitcoin Closing Price. It moved like Gold. Gold was considered as a safe asset to consumers and it means consumer think that Bitcoin is a safe asset. On the other hand, WTI oil price went Bitcoin Closing Price's way. It implies that Bitcoin are regarded to investment asset like raw materials market's product. The variables that were not significant in the analysis were search traffic of bitcoin, search traffic of ransomware, search traffic of war, memory vendor's stock price, FOMC policy interest rates. In search traffic of bitcoin, we judged that interest in Bitcoin did not lead to purchase of Bitcoin. It means search traffic of Bitcoin didn't reflect all of Bitcoin's demand. So, it implies there are some factors that regulate and mediate the Bitcoin purchase. In search traffic of ransomware, it is hard to say concern of ransomware determined the whole Bitcoin demand. Because only a few people damaged by ransomware and the percentage of hackers requiring Bitcoins was low. Also, its information security problem is events not continuous issues. Search traffic of war was not significant. Like stock market, generally it has negative in relation to war, but exceptional case like Gulf war, it moves stakeholders' profits and environment. We think that this is the same case. In memory vendor stock price, this is because memory vendors' flagship products were not VRAM which is essential for Bitcoin supply. In FOMC policy interest rates, when the interest rate is low, the surplus capital is invested in securities such as stocks. But Bitcoin' price fluctuation was large so it is not recognized as an attractive commodity to the consumers. In addition, unlike the stock market, Bitcoin doesn't have any safety policy such as Circuit breakers and Sidecar. Through this study, we verified what factors effect on change of Bitcoin Closing Price, and interpreted why such change happened. In addition, establishing the characteristics of Bitcoin as a safe asset and investment asset, we provide a guide how consumer, financial institution and government organization approach to the cryptocurrency. Moreover, corroborating the factors affecting change of Bitcoin Closing Price, researcher will get some clue and qualification which factors have to be considered in hereafter cryptocurrency study.

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

NFC-based Smartwork Service Model Design (NFC 기반의 스마트워크 서비스 모델 설계)

  • Park, Arum;Kang, Min Su;Jun, Jungho;Lee, Kyoung Jun
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.157-175
    • /
    • 2013
  • Since Korean government announced 'Smartwork promotion strategy' in 2010, Korean firms and government organizations have started to adopt smartwork. However, the smartwork has been implemented only in a few of large enterprises and government organizations rather than SMEs (small and medium enterprises). In USA, both Yahoo! and Best Buy have stopped their flexible work because of its reported low productivity and job loafing problems. In addition, according to the literature on smartwork, we could draw obstacles of smartwork adoption and categorize them into the three types: institutional, organizational, and technological. The first category of smartwork adoption obstacles, institutional, include the difficulties of smartwork performance evaluation metrics, the lack of readiness of organizational processes, limitation of smartwork types and models, lack of employee participation in smartwork adoption procedure, high cost of building smartwork system, and insufficiency of government support. The second category, organizational, includes limitation of the organization hierarchy, wrong perception of employees and employers, a difficulty in close collaboration, low productivity with remote coworkers, insufficient understanding on remote working, and lack of training about smartwork. The third category, technological, obstacles include security concern of mobile work, lack of specialized solution, and lack of adoption and operation know-how. To overcome the current problems of smartwork in reality and the reported obstacles in literature, we suggest a novel smartwork service model based on NFC(Near Field Communication). This paper suggests NFC-based Smartwork Service Model composed of NFC-based Smartworker networking service and NFC-based Smartwork space management service. NFC-based smartworker networking service is comprised of NFC-based communication/SNS service and NFC-based recruiting/job seeking service. NFC-based communication/SNS Service Model supplements the key shortcomings that existing smartwork service model has. By connecting to existing legacy system of a company through NFC tags and systems, the low productivity and the difficulty of collaboration and attendance management can be overcome since managers can get work processing information, work time information and work space information of employees and employees can do real-time communication with coworkers and get location information of coworkers. Shortly, this service model has features such as affordable system cost, provision of location-based information, and possibility of knowledge accumulation. NFC-based recruiting/job-seeking service provides new value by linking NFC tag service and sharing economy sites. This service model has features such as easiness of service attachment and removal, efficient space-based work provision, easy search of location-based recruiting/job-seeking information, and system flexibility. This service model combines advantages of sharing economy sites with the advantages of NFC. By cooperation with sharing economy sites, the model can provide recruiters with human resource who finds not only long-term works but also short-term works. Additionally, SMEs (Small Medium-sized Enterprises) can easily find job seeker by attaching NFC tags to any spaces at which human resource with qualification may be located. In short, this service model helps efficient human resource distribution by providing location of job hunters and job applicants. NFC-based smartwork space management service can promote smartwork by linking NFC tags attached to the work space and existing smartwork system. This service has features such as low cost, provision of indoor and outdoor location information, and customized service. In particular, this model can help small company adopt smartwork system because it is light-weight system and cost-effective compared to existing smartwork system. This paper proposes the scenarios of the service models, the roles and incentives of the participants, and the comparative analysis. The superiority of NFC-based smartwork service model is shown by comparing and analyzing the new service models and the existing service models. The service model can expand scope of enterprises and organizations that adopt smartwork and expand the scope of employees that take advantages of smartwork.

A Study on the Characteristics of Enterprise R&D Capabilities Using Data Mining (데이터마이닝을 활용한 기업 R&D역량 특성에 관한 탐색 연구)

  • Kim, Sang-Gook;Lim, Jung-Sun;Park, Wan
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.1-21
    • /
    • 2021
  • As the global business environment changes, uncertainties in technology development and market needs increase, and competition among companies intensifies, interests and demands for R&D activities of individual companies are increasing. In order to cope with these environmental changes, R&D companies are strengthening R&D investment as one of the means to enhance the qualitative competitiveness of R&D while paying more attention to facility investment. As a result, facilities or R&D investment elements are inevitably a burden for R&D companies to bear future uncertainties. It is true that the management strategy of increasing investment in R&D as a means of enhancing R&D capability is highly uncertain in terms of corporate performance. In this study, the structural factors that influence the R&D capabilities of companies are explored in terms of technology management capabilities, R&D capabilities, and corporate classification attributes by utilizing data mining techniques, and the characteristics these individual factors present according to the level of R&D capabilities are analyzed. This study also showed cluster analysis and experimental results based on evidence data for all domestic R&D companies, and is expected to provide important implications for corporate management strategies to enhance R&D capabilities of individual companies. For each of the three viewpoints, detailed evaluation indexes were composed of 7, 2, and 4, respectively, to quantitatively measure individual levels in the corresponding area. In the case of technology management capability and R&D capability, the sub-item evaluation indexes that are being used by current domestic technology evaluation agencies were referenced, and the final detailed evaluation index was newly constructed in consideration of whether data could be obtained quantitatively. In the case of corporate classification attributes, the most basic corporate classification profile information is considered. In particular, in order to grasp the homogeneity of the R&D competency level, a comprehensive score for each company was given using detailed evaluation indicators of technology management capability and R&D capability, and the competency level was classified into five grades and compared with the cluster analysis results. In order to give the meaning according to the comparative evaluation between the analyzed cluster and the competency level grade, the clusters with high and low trends in R&D competency level were searched for each cluster. Afterwards, characteristics according to detailed evaluation indicators were analyzed in the cluster. Through this method of conducting research, two groups with high R&D competency and one with low level of R&D competency were analyzed, and the remaining two clusters were similar with almost high incidence. As a result, in this study, individual characteristics according to detailed evaluation indexes were analyzed for two clusters with high competency level and one cluster with low competency level. The implications of the results of this study are that the faster the replacement cycle of professional managers who can effectively respond to changes in technology and market demand, the more likely they will contribute to enhancing R&D capabilities. In the case of a private company, it is necessary to increase the intensity of input of R&D capabilities by enhancing the sense of belonging of R&D personnel to the company through conversion to a corporate company, and to provide the accuracy of responsibility and authority through the organization of the team unit. Since the number of technical commercialization achievements and technology certifications are occurring both in the case of contributing to capacity improvement and in case of not, it was confirmed that there is a limit in reviewing it as an important factor for enhancing R&D capacity from the perspective of management. Lastly, the experience of utility model filing was identified as a factor that has an important influence on R&D capability, and it was confirmed the need to provide motivation to encourage utility model filings in order to enhance R&D capability. As such, the results of this study are expected to provide important implications for corporate management strategies to enhance individual companies' R&D capabilities.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.


  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.