• Title/Summary/Keyword: Two Systems

Search Result 18,498, Processing Time 0.047 seconds

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

Effects of insulin and IGF on growth and functional differentiation in primary cultured rabbit kidney proximal tubule cells - Effects of IGF-I on Na+ uptake - (초대배양된 토끼 신장 근위세뇨관세포의 성장과 기능분화에 대한 insulin과 IGF의 효과 - Na+ uptake에 대한 IGF-I의 효과 -)

  • Han, Ho-jae;Park, Kwon-moo;Lee, Jang-hern;Yang, IL-suk
    • Korean Journal of Veterinary Research
    • /
    • v.36 no.4
    • /
    • pp.783-794
    • /
    • 1996
  • It has been suggested that ion transport systems are intimately involved in mediating the effects of growth regulatory factors on the growth of a number of different types of animal cells in vivo. The functional importance of the apical membrane $Na^+/H^+$ antiporter in the renal proximal tubule is evidenced by estimates that this transporter mediates the reabsorption of approximately one third of the filtered load of sodium and the bulk of the secretion of hydrogen ions. This study was designed to investigate the pathway utilized by IGF-I in regulating sodium transport in primary cultured renal proximal tubule cells. Results were as follows : 1. $Na^+$ was observed to accumulate in the primary cells as a function of time. Raising the concentration of extracellular NaCl induced an decrease in $Na^+$ uptake compared with control cells in a dose dependent manner. The rate of $Na^+$ uptake into the primary cells was about two times higher in the absence of NaCl($40.11{\pm}1.76pmole\;Na^+/mg\;protein/min$) than in the presence of 140mM NaCl($17.82{\pm}0.94pmole\;Na^+/mg\;protein/min$) at the 30 minute uptake. 2. $Na^+$ uptake was inhibited by IAA($1{\times}10^{-4}M$) or valinomycin($5{\times}10^{-6}M$) treatment($50.51{\pm}4.04$ and $57.65{\pm}2.27$ of that of control, respectively). $Na^+$ uptake by the primary proximal tubule cells was significantly increased by ouabain($5{\times}10^{-5}M$) treatment($140.23{\pm}3.37%$ of that of control). When actinomycin D($1{\times}10^{-7}M$) or cycloheximide($4{\times}10^{-5}M$) was applied, $Na^+$ uptake was decreased to $90.21{\pm}2.39%$ or $89.64{\pm}3.69%$ of control in IGF-I($1{\times}10^{-5}M$) treated cells, respectively. 3. Extracellular cAMP decreased $Na^+$ uptake in a dose-dependent manner($10^{-8}-10^{-4}M$). IBMX($5{\times}10^{-5}M$) also inhibited $Na^+$ uptake. Treatment of cells with pertussis toxin(50pg/ml) or cholera toxin($1{\mu}g/ml$) inhibited $Na^+$ uptake. Extracellular PMA decreased $Na^+$ uptake in a dose-dependent manner(1-100ng/ml). 100 ng/ml PMA concentration significantly inhibited $Na^+$ uptake in IGF-I treated cells. However, staurosporine($1{\times}10^{-7}M$) had no effect on $Na^+$ uptake. When PMA and staurosporine were added together, the inhibition of $Na^+$ uptake was not observed. In conclusion, sodium uptake in primary cultured rabbit renal proximal tubule cells was dependent on membrane potentials and intracellular energy levels. IGF-I stimulates sodium uptake through mechanisms that involve some degree of de novo protein and/or RNA synthesis, and cAMP and/or PKC pathway mediating the action mechanisms of IGF-I.

  • PDF

Influence of Fertilizer Type on Physiological Responses during Vegetative Growth in 'Seolhyang' Strawberry (생리적 반응이 다른 비료 종류가 '설향' 딸기의 영양생장에 미치는 영향)

  • Lee, Hee Su;Jang, Hyun Ho;Choi, Jong Myung;Kim, Dae Young
    • Horticultural Science & Technology
    • /
    • v.33 no.1
    • /
    • pp.39-46
    • /
    • 2015
  • Objective of this research was to investigate the influence of compositions and concentrations of fertilizer solutions on the vegetative growth and nutrient uptake of 'Seolhyang' strawberry. To achieve this, the solutions of acid fertilizer (AF), neutral fertilizer (NF), and basic fertilizer (BF) were prepared at concentrations of 100 or $200mg{\cdot}L^{-1}$ based on N and applied during the 100 days after transplanting. The changes in chemical properties of the soil solution were analysed every two weeks, and crop growth measurements as well as tissue analyses for mineral contents were conducted 100 days after fertilization. The growth was the highest in the treatments with BF, followed by those with NF and AF. The heaviest fresh and dry weights among treatments were 151.3 and 37.8 g, respectively, with BF $200mg{\cdot}L^{-1}$. In terms of tissue nutrient contents, the highest N, P and Na contents, of 3.08, 0.54, and 0.10%, respectively, were observed in the treatment with NF $200mg{\cdot}L^{-1}$. The highest K content was 2.83%, in the treatment with AF $200mg{\cdot}L^{-1}$, while the highest Ca and Mg were 0.98 and 0.42%, respectively, in BF $100mg{\cdot}L^{-1}$. The AF treatments had higher tissue Fe, Mn, Zn, and Cu contents compared to those of NF or BF when fertilizer concentrations were controlled to equal. During the 100 days after fertilization, the highest and lowest pH in soil solution of root media among all treatments tested were 6.67 in BF $100mg{\cdot}L^{-1}$ and 4.69 in AF $200mg{\cdot}L^{-1}$, respectively. The highest and lowest ECs were $5.132dS{\cdot}m^{-1}$ in BF $200mg{\cdot}L^{-1}$ and $1.448dS{\cdot}m^{-1}$ in BF $100mg{\cdot}L^{-1}$, respectively. For the concentrations of macronutrients in the soil solution of root media, the AF $200mg{\cdot}L^{-1}$ treatment gave the highest $NH_4$ concentrations followed by NF $200mg{\cdot}L^{-1}$ and AF $100mg{\cdot}L^{-1}$. The K concentrations in all treatments rose gradually after day 42 in all treatments. When fertilizer concentrations were controlled to equal, the highest Ca and Mg concentrations were observed in AF followed by NF and BF until day 84 in fertilization. The BF treatments produced the highest $NO_3$ concentrations, followed by NF and AF. The trends in the change of $PO_4$ concentration were similar in all treatments. The $SO_4$ concentrations were higher in treatments with AF than those with NF or BF until day 70 in fertilization. These results indicate that compositions of fertilizer solution should to be modified to contain more alkali nutrients when 'Seolhyang' strawberry is cultivated through inert media and nutri-culture systems.

Dynamic Virtual Ontology using Tags with Semantic Relationship on Social-web to Support Effective Search (효율적 자원 탐색을 위한 소셜 웹 태그들을 이용한 동적 가상 온톨로지 생성 연구)

  • Lee, Hyun Jung;Sohn, Mye
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.19-33
    • /
    • 2013
  • In this research, a proposed Dynamic Virtual Ontology using Tags (DyVOT) supports dynamic search of resources depending on user's requirements using tags from social web driven resources. It is general that the tags are defined by annotations of a series of described words by social users who usually tags social information resources such as web-page, images, u-tube, videos, etc. Therefore, tags are characterized and mirrored by information resources. Therefore, it is possible for tags as meta-data to match into some resources. Consequently, we can extract semantic relationships between tags owing to the dependency of relationships between tags as representatives of resources. However, to do this, there is limitation because there are allophonic synonym and homonym among tags that are usually marked by a series of words. Thus, research related to folksonomies using tags have been applied to classification of words by semantic-based allophonic synonym. In addition, some research are focusing on clustering and/or classification of resources by semantic-based relationships among tags. In spite of, there also is limitation of these research because these are focusing on semantic-based hyper/hypo relationships or clustering among tags without consideration of conceptual associative relationships between classified or clustered groups. It makes difficulty to effective searching resources depending on user requirements. In this research, the proposed DyVOT uses tags and constructs ontologyfor effective search. We assumed that tags are extracted from user requirements, which are used to construct multi sub-ontology as combinations of tags that are composed of a part of the tags or all. In addition, the proposed DyVOT constructs ontology which is based on hierarchical and associative relationships among tags for effective search of a solution. The ontology is composed of static- and dynamic-ontology. The static-ontology defines semantic-based hierarchical hyper/hypo relationships among tags as in (http://semanticcloud.sandra-siegel.de/) with a tree structure. From the static-ontology, the DyVOT extracts multi sub-ontology using multi sub-tag which are constructed by parts of tags. Finally, sub-ontology are constructed by hierarchy paths which contain the sub-tag. To create dynamic-ontology by the proposed DyVOT, it is necessary to define associative relationships among multi sub-ontology that are extracted from hierarchical relationships of static-ontology. The associative relationship is defined by shared resources between tags which are linked by multi sub-ontology. The association is measured by the degree of shared resources that are allocated into the tags of sub-ontology. If the value of association is larger than threshold value, then associative relationship among tags is newly created. The associative relationships are used to merge and construct new hierarchy the multi sub-ontology. To construct dynamic-ontology, it is essential to defined new class which is linked by two more sub-ontology, which is generated by merged tags which are highly associative by proving using shared resources. Thereby, the class is applied to generate new hierarchy with extracted multi sub-ontology to create a dynamic-ontology. The new class is settle down on the ontology. So, the newly created class needs to be belong to the dynamic-ontology. So, the class used to new hyper/hypo hierarchy relationship between the class and tags which are linked to multi sub-ontology. At last, DyVOT is developed by newly defined associative relationships which are extracted from hierarchical relationships among tags. Resources are matched into the DyVOT which narrows down search boundary and shrinks the search paths. Finally, we can create the DyVOT using the newly defined associative relationships. While static data catalog (Dean and Ghemawat, 2004; 2008) statically searches resources depending on user requirements, the proposed DyVOT dynamically searches resources using multi sub-ontology by parallel processing. In this light, the DyVOT supports improvement of correctness and agility of search and decreasing of search effort by reduction of search path.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.

A Study on the Identifying OECMs in Korea for Achieving the Kunming-Montreal Global Biodiversity Framework - Focusing on the Concept and Experts' Perception - (쿤밍-몬트리올 글로벌 생물다양성 보전목표 성취를 위한 우리나라 OECM 발굴방향 연구 - 개념 고찰 및 전문가 인식을 중심으로 -)

  • Hag-Young Heo;Sun-Joo Park
    • Korean Journal of Environment and Ecology
    • /
    • v.37 no.4
    • /
    • pp.302-314
    • /
    • 2023
  • This study aims to explore the direction for Korea's effective response to Target 3 (30by30), which can be said to be the core of the Kunming-Montreal Global Biodiversity Framework (K-M GBF) of the Convention on Biological Diversity (CBD), to find the direction of systematic OECM (Other Effective area-based Conservation Measures) discovery at the national level through a survey of global conceptual review and expert perception of OECM. This study examined ① the use of Korean terms related to OECM, ② derivation of determining criteria reflecting global standards, ③ deriving types of potential OECM candidates in Korea, and ④ considerations for OECM identification and reporting to explore the direction for identifying systematic, national-level OECM that complies with global standards and reflects the Korean context. First, there was consensus for using Korean terminology that reflects the concept of OECM rather than simple translations, and it was determined that "nature coexistence area" was the most preferred term (12 people) and had the same context as CBD 2050 Vision of "a world of living in harmony with nature." This study suggests utilizing four criteria (1. No protected areas, 2. Geographic boundaries, 3. Governance/management, and 4. Biodiversity value) that reflect OECM's core characteristics in the first-stage selection process, carrying out the consensus-building process (stage 2) with the relevant agencies, and adding two criteria (3-1 Effectiveness and sustainability of governance and management and 4-1 Long-term conservation) and performing the in-depth diagnosis in stage 3 (full assessment for reporting). The 28 types examined in this study were generally compatible with OECMs (4.45-6.21/7 points, mean 5.24). In particular, the "Conservation Properties (6.21 points)" and "Conservation Agreements (6.07 points)", which are controlled by National Nature Trust, are shown to be the most in line with the OECM concept. They were followed by "Buffer zone of World Natural Heritage (5.77 points)", "Temple Forest (5.73 points)", "Green-belt (Restricted development zones, 5.63 points)", "DMZ (5.60 points)", and "Buffer zone of biosphere reserve (5.50 point)" to have high potential. In the case of "Uninhabited Islands under Absolute Conservation", the response that they conformed to the protected areas (5.83/7 points) was higher than the OECM compatibility (5.52/7 points), it is determined that in the future, it would be preferable to promote the listing of absolute unprotected islands in the Korea Database on Protected Areas (KDPA) along with their surrounding waters (1 km). Based on the results of a global OECM standard review and expert perception survey, 10 items were suggested as considerations when identifying OECM in the Korean context. In the future, continuous research is needed to identify the potential OECMs through site-level assessment regarding these considerations and establish an effective in-situ conservation system at the national level by linking existing protected area systems and identified OECMs.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Structural features and Diffusion Patterns of Gartner Hype Cycle for Artificial Intelligence using Social Network analysis (인공지능 기술에 관한 가트너 하이프사이클의 네트워크 집단구조 특성 및 확산패턴에 관한 연구)

    • Shin, Sunah;Kang, Juyoung
      • Journal of Intelligence and Information Systems
      • /
      • v.28 no.1
      • /
      • pp.107-129
      • /
      • 2022
    • It is important to preempt new technology because the technology competition is getting much tougher. Stakeholders conduct exploration activities continuously for new technology preoccupancy at the right time. Gartner's Hype Cycle has significant implications for stakeholders. The Hype Cycle is a expectation graph for new technologies which is combining the technology life cycle (S-curve) with the Hype Level. Stakeholders such as R&D investor, CTO(Chef of Technology Officer) and technical personnel are very interested in Gartner's Hype Cycle for new technologies. Because high expectation for new technologies can bring opportunities to maintain investment by securing the legitimacy of R&D investment. However, contrary to the high interest of the industry, the preceding researches faced with limitations aspect of empirical method and source data(news, academic papers, search traffic, patent etc.). In this study, we focused on two research questions. The first research question was 'Is there a difference in the characteristics of the network structure at each stage of the hype cycle?'. To confirm the first research question, the structural characteristics of each stage were confirmed through the component cohesion size. The second research question is 'Is there a pattern of diffusion at each stage of the hype cycle?'. This research question was to be solved through centralization index and network density. The centralization index is a concept of variance, and a higher centralization index means that a small number of nodes are centered in the network. Concentration of a small number of nodes means a star network structure. In the network structure, the star network structure is a centralized structure and shows better diffusion performance than a decentralized network (circle structure). Because the nodes which are the center of information transfer can judge useful information and deliver it to other nodes the fastest. So we confirmed the out-degree centralization index and in-degree centralization index for each stage. For this purpose, we confirmed the structural features of the community and the expectation diffusion patterns using Social Network Serice(SNS) data in 'Gartner Hype Cycle for Artificial Intelligence, 2021'. Twitter data for 30 technologies (excluding four technologies) listed in 'Gartner Hype Cycle for Artificial Intelligence, 2021' were analyzed. Analysis was performed using R program (4.1.1 ver) and Cyram Netminer. From October 31, 2021 to November 9, 2021, 6,766 tweets were searched through the Twitter API, and converting the relationship user's tweet(Source) and user's retweets (Target). As a result, 4,124 edgelists were analyzed. As a reult of the study, we confirmed the structural features and diffusion patterns through analyze the component cohesion size and degree centralization and density. Through this study, we confirmed that the groups of each stage increased number of components as time passed and the density decreased. Also 'Innovation Trigger' which is a group interested in new technologies as a early adopter in the innovation diffusion theory had high out-degree centralization index and the others had higher in-degree centralization index than out-degree. It can be inferred that 'Innovation Trigger' group has the biggest influence, and the diffusion will gradually slow down from the subsequent groups. In this study, network analysis was conducted using social network service data unlike methods of the precedent researches. This is significant in that it provided an idea to expand the method of analysis when analyzing Gartner's hype cycle in the future. In addition, the fact that the innovation diffusion theory was applied to the Gartner's hype cycle's stage in artificial intelligence can be evaluated positively because the Gartner hype cycle has been repeatedly discussed as a theoretical weakness. Also it is expected that this study will provide a new perspective on decision-making on technology investment to stakeholdes.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.