• Title/Summary/Keyword: MODELS

Search Result 40,821, Processing Time 0.065 seconds

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

Documentation of Intangible Cultural Heritage Using Motion Capture Technology Focusing on the documentation of Seungmu, Salpuri and Taepyeongmu (부록 3. 모션캡쳐를 이용한 무형문화재의 기록작성 - 국가지정 중요무형문화재 승무·살풀이·태평무를 중심으로 -)

  • Park, Weonmo;Go, Jungil;Kim, Yongsuk
    • Korean Journal of Heritage: History & Science
    • /
    • v.39
    • /
    • pp.351-378
    • /
    • 2006
  • With the development of media, the methods for the documentation of intangible cultural heritage have been also developed and diversified. As well as the previous analogue ways of documentation, the have been recently applying new multi-media technologies focusing on digital pictures, sound sources, movies, etc. Among the new technologies, the documentation of intangible cultural heritage using the method of 'Motion Capture' has proved itself prominent especially in the fields that require three-dimensional documentation such as dances and performances. Motion Capture refers to the documentation technology which records the signals of the time varing positions derived from the sensors equipped on the surface of an object. It converts the signals from the sensors into digital data which can be plotted as points on the virtual coordinates of the computer and records the movement of the points during a certain period of time, as the object moves. It produces scientific data for the preservation of intangible cultural heritage, by displaying digital data which represents the virtual motion of a holder of an intangible cultural heritage. National Research Institute of Cultural Properties (NRICP) has been working on for the development of new documentation method for the Important Intangible Cultural Heritage designated by Korean government. This is to be done using 'motion capture' equipments which are also widely used for the computer graphics in movie or game industries. This project is designed to apply the motion capture technology for 3 years- from 2005 to 2007 - for 11 performances from 7 traditional dances of which body gestures have considerable values among the Important Intangible Cultural Heritage performances. This is to be supported by lottery funds. In 2005, the first year of the project, accumulated were data of single dances, such as Seungmu (monk's dance), Salpuri(a solo dance for spiritual cleansing dance), Taepyeongmu (dance of peace), which are relatively easy in terms of performing skills. In 2006, group dances, such as Jinju Geommu (Jinju sword dance), Seungjeonmu (dance for victory), Cheoyongmu (dance of Lord Cheoyong), etc., will be documented. In the last year of the project, 2007, education programme for comparative studies, analysis and transmission of intangible cultural heritage and three-dimensional contents for public service will be devised, based on the accumulated data, as well as the documentation of Hakyeonhwadae Habseolmu (crane dance combined with the lotus blossom dance). By describing the processes and results of motion capture documentation of Salpuri dance (Lee Mae-bang), Taepyeongmu (Kang seon-young) and Seungmu (Lee Mae-bang, Lee Ae-ju and Jung Jae-man) conducted in 2005, this report introduces a new approach for the documentation of intangible cultural heritage. During the first year of the project, two questions have been raised. First, how can we capture motions of a holder (dancer) without cutoffs during quite a long performance? After many times of tests, the motion capture system proved itself stable with continuous results. Second, how can we reproduce the accurate motion without the re-targeting process? The project re-created the most accurate motion of the dancer's gestures, applying the new technology to drew out the shape of the dancers's body digital data before the motion capture process for the first time in Korea. The accurate three-dimensional body models for four holders obtained by the body scanning enhanced the accuracy of the motion capture of the dance.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Analysis of promising countries for export using parametric and non-parametric methods based on ERGM: Focusing on the case of information communication and home appliance industries (ERGM 기반의 모수적 및 비모수적 방법을 활용한 수출 유망국가 분석: 정보통신 및 가전 산업 사례를 중심으로)

    • Jun, Seung-pyo;Seo, Jinny;Yoo, Jae-Young
      • Journal of Intelligence and Information Systems
      • /
      • v.28 no.1
      • /
      • pp.175-196
      • /
      • 2022
    • Information and communication and home appliance industries, which were one of South Korea's main industries, are gradually losing their export share as their export competitiveness is weakening. This study objectively analyzed export competitiveness and suggested export-promising countries in order to help South Korea's information communication and home appliance industries improve exports. In this study, network properties, centrality, and structural hole analysis were performed during network analysis to evaluate export competitiveness. In order to select promising export countries, we proposed a new variable that can take into account the characteristics of an already established International Trade Network (ITN), that is, the Global Value Chain (GVC), in addition to the existing economic factors. The conditional log-odds for individual links derived from the Exponential Random Graph Model (ERGM) in the analysis of the cross-border trade network were assumed as a proxy variable that can indicate the export potential. In consideration of the possibility of ERGM linkage, a parametric approach and a non-parametric approach were used to recommend export-promising countries, respectively. In the parametric method, a regression analysis model was developed to predict the export value of the information and communication and home appliance industries in South Korea by additionally considering the link-specific characteristics of the network derived from the ERGM to the existing economic factors. Also, in the non-parametric approach, an abnormality detection algorithm based on the clustering method was used, and a promising export country was proposed as a method of finding outliers that deviate from two peers. According to the research results, the structural characteristic of the export network of the industry was a network with high transferability. Also, according to the centrality analysis result, South Korea's influence on exports was weak compared to its size, and the structural hole analysis result showed that export efficiency was weak. According to the model for recommending promising exporting countries proposed by this study, in parametric analysis, Iran, Ireland, North Macedonia, Angola, and Pakistan were promising exporting countries, and in nonparametric analysis, Qatar, Luxembourg, Ireland, North Macedonia and Pakistan were analyzed as promising exporting countries. There were differences in some countries in the two models. The results of this study revealed that the export competitiveness of South Korea's information and communication and home appliance industries in GVC was not high compared to the size of exports, and thus showed that exports could be further reduced. In addition, this study is meaningful in that it proposed a method to find promising export countries by considering GVC networks with other countries as a way to increase export competitiveness. This study showed that, from a policy point of view, the international trade network of the information communication and home appliance industries has an important mutual relationship, and although transferability is high, it may not be easily expanded to a three-party relationship. In addition, it was confirmed that South Korea's export competitiveness or status was lower than the export size ranking. This paper suggested that in order to improve the low out-degree centrality, it is necessary to increase exports to Italy or Poland, which had significantly higher in-degrees. In addition, we argued that in order to improve the centrality of out-closeness, it is necessary to increase exports to countries with particularly high in-closeness. In particular, it was analyzed that Morocco, UAE, Argentina, Russia, and Canada should pay attention as export countries. This study also provided practical implications for companies expecting to expand exports. The results of this study argue that companies expecting export expansion need to pay attention to countries with a relatively high potential for export expansion compared to the existing export volume by country. In particular, for companies that export daily necessities, countries that should pay attention to the population are presented, and for companies that export high-end or durable products, countries with high GDP, or purchasing power, relatively low exports are presented. Since the process and results of this study can be easily extended and applied to other industries, it is also expected to develop services that utilize the results of this study in the public sector.

    The Effects of Self-regulatory Resources and Construal Levels on the Choices of Zero-cost Products (자아조절자원 및 해석수준이 공짜대안 선택에 미치는 영향)

    • Lee, Jinyong;Im, Seoung Ah
      • Asia Marketing Journal
      • /
      • v.13 no.4
      • /
      • pp.55-76
      • /
      • 2012
    • Most people prefer to choose zero-cost products they may get without paying any money. The 'zero-cost effect' can be explained with a 'zero-cost model' where consumers attach special values to zero-cost products in a different way from general economic models (Shampanier, Mazar and Ariely 2007). If 2 different products at the regular prices of ₩200 and ₩400 simultaneously offer ₩200 discounts, the prices will be changed to ₩0 and ₩200, respectively. In spite of the same price gap of the two products after the ₩200 discounts, people are much more likely to select the free alternative than the same product at the price of ₩200. Although prior studies have focused on the 'zero-cost effect' in isolation of other factors, this study investigates the moderating effects of a self-regulatory resource and a construal level on the selection of free products. Self-regulatory resources induce people to control or regulate their behavior. However, since self-regulatory resources are limited, they are to be easily depleted when exerted (Muraven, Tice, and Baumeister 1998). Without the resources, consumers tend to become less sensitive to price changes and to spend money more extravagantly (Vohs and Faber 2007). Under this condition, they are also likely to invest less effort on their information processing and to make more intuitive decisions (Pocheptsova, Amir, Dhar, and Baumeister 2009). Therefore, context effects such as price changes and zero cost effects are less likely in the circumstances of resource depletion. In addition, construal levels have profound effects on the ways of information processing (Trope and Liberman 2003, 2010). In a high construal level, people tend to attune their minds to core features and desirability aspects, whereas, in a low construal level, they are more likely to process information based on secondary features and feasibility aspects (Khan, Zhu, and Kalra 2010). A perceived value of a product is more related to desirability whereas a zero cost or a price level is more associated with feasibility. Thus, context effects or reliance on feasibility (for instance, the zero cost effect) will be diminished in a high level construal while those effects may remain in a low level construal. When people make decisions, these 2 factors can influence the magnitude of the 'zero-cost effect'. This study ran two experiments to investigate the effects of self-regulatory resources and construal levels on the selection of a free product. Kisses and Ferrero-Rocher, which were adopted in the prior study (Shampanier et al. 2007) were also used as alternatives in Experiments 1 and 2. We designed Experiment 1 in order to test whether self-regulatory resource depletion will moderate the zero-cost effect. The level of self-regulatory resources was manipulated with two different tasks, a Sudoku task in the depletion condition and a task of drawing diagrams in the non-depletion condition. Upon completion of the manipulation task, subjects were randomly assigned to one of a decision set with a zero-cost option (i.e., Kisses ₩0, and Ferrero-Rocher ₩200) or a set without a zero-cost option (i.e., Kisses ₩200, and Ferrero-Rocher ₩400). A pair of alternatives in the two decision sets have the same price gap of ₩200 between a low-priced Kisses and a high-priced Ferrero-Rocher. Subjects in the no-depletion condition selected Kisses more often (71.88%) over Ferrero-Rocher when Kisses was free than when it was priced at ₩200 (34.88%). However, the zero-cost effect disappeared when people do not have self-regulatory resources. Experiment 2 was conducted to investigate whether constual levels influence the magnitude of the 'zero-cost effect'. To manipulate construal levels, 4 different 'why (in the high construal level condition)' or 'how (in the low construal level condition)' questions about health management were asked. They were presented with 4 boxes connected with downward arrows. In a box at the top, there was one question, 'Why do I maintain good physical health?' or 'How do I maintain good physical health?' Subjects inserted a response to the question of why or how they would maintain good physical health. Similar tasks were repeated for the 2nd, 3rd, and 4th responses. After the manipulation task, subjects were randomly assigned either to a decision set with a zero-cost option, or to a set without it, as in Experiment 1. When a low construal level is primed with 'how', subjects chose free Kisses (60.66%) more often over Ferrero-Rocher than they chose ₩200 Kisses (42.19%) over ₩400 FerreroRocher. On contrast, the zero-cost effect could not be observed any longer when a high construal level is primed with 'why'.

    • PDF

    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.