• Title/Summary/Keyword: 다변량 목표변수

Search Result 6, Processing Time 0.024 seconds

A Study on the Node Split in Decision Tree with Multivariate Target Variables (다변량 목표변수를 갖는 의사결정나무의 노드분리에 관한 연구)

  • Kim, Seong-Jun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.386-390
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields. Classifying a group into subgroups is one of the most important subjects in data mining. Tree-based methods, known as decision trees, provide an efficient way to finding the classification model. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variable should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present some methods for measuring the node impurity, which are applicable to data sets with multivariate target variables. For illustration, a numerical cxample is given with discussion.

Bivariate drought frequency analysis using copula function (Copula 함수 기반의 이변량 가뭄빈도 해석)

  • Lee, Jeong Ju;Kim, Ha Yung;Kwon, Moon Hyuck;Kwon, Hyun Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.309-309
    • /
    • 2022
  • 특정 극치사상 자료에 대한 특성 분석 시 수문자료에 대한 빈도해석은 일반적으로 단일 확률 변수를 기준으로 이루어지는 단변량 해석 방법이 활용된다. 그러나 두 가지 이상의 변량이 서로 상관성을 가지는 경우 다변량 빈도해석이 요구되며, 이를 단변량으로 해석하는 경우 재현기간의 과소추정 등의 문제점이 발생할 수 있다. 최근 이러한 점을 개선하기 위하여 다변량 빈도해석에 관한 연구가 지속적으로 진행되고 있다(Kwon and Lall, 2016; Vaziri et al., 2018). 특히, 가뭄의 경우, 강도(intensity)뿐만 아니라 지속기간, 심도도 매우 중요한 인자로 고려되고 있다. 특히, 가뭄지속기간과 심도의 경우 두 인자 간의 상관성이 매우 크기 때문에 단변량(univariate) 가뭄빈도해석 보다 다변량으로(multivariate) 가뭄빈도해석을 수행하는 것이 가뭄위험도 평가 측면에서 유리하다고 알려져 있다(Shiau and Shen, 2001; Kim et al., 2017). 따라서 이 둘을 결합한 빈도 해석을 위해 Copula Function을 이용한 다변량 빈도 해석에 관한 연구들이 활발히 진행되고 있다. 홍수의 경우 지속시간별 연최대강수량 계열을 이용한 빈도해석 과정이 지침으로 정립되어 수자원 설계 실무에서 활용되고 있으나, 가뭄은 실무에서 활용할 수 있는 지침 및 분석 도구가 없는 실정이다. 이에 환경부와 국가가뭄정보분석센터에서는 '20년도에 단변량 가뭄빈도 해석을 위한 프로그램을 제작·배포하였다. 본 연구에서는 가뭄의 특성을 대변하는 상관도 높은 두 인자인 가뭄 심도(severity)와 가뭄 지속기간(duration)이라는 두 가지 특성을 함께 고려해 이변량(bivariate) 가뭄 빈도를 해석할 수 있는 도구를 개발하는 것을 목표로, 다양한 확률분포형을 이용한 최적 주변 확률분포형 선정과 최신 Copula Function들을 이용한 최적 결합확률분포 추정을 통해 신뢰도 높은 2변량 가뭄빈도 해석을 수행할 수 있는 프로그램을 제작하였으며, 테스트 버전 배포 등을 거쳐 누구나 사용할 수 있도록 공개할 예정이다.

  • PDF

Analysis of Employment Effect of SMEs According to the Results of Technology Appraisal for Investment (투자용 기술평가 결과에 따른 중소기업의 고용효과 분석)

  • Lee, Jun-won
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.4
    • /
    • pp.77-88
    • /
    • 2023
  • The purpose of this study is to confirm whether the current technology appraisal model for investment, which is designed to identify high-growth SMEs in sales, which is one of the characteristics of gazelle companies, has the possibility of expanding employment effects. For SMEs classified as technology investment adequate firms(TI1-TI6) through technology appraisal for investment between 2016 and 2018 were targeted. At this time, the employment effect was analyzed by dividing the absolute employment effect and the relative employment effect. As a result of the analysis, it was confirmed that the technology appraisal items for investment defined as innovation characteristics did not have significant explanatory power for the absolute employment effect. However, for the relative employment effect, among innovation characteristics, technicality(TC) was found to have significant explanatory power, and this is because the item appraised based on future growth potential. In particular, the relative employment effect is meaningful in terms of the actual employment effect, and the conclusion is drawn that the current technology appraisal model for investment is an appraisal model with the possibility of expansion in terms of employment effect.

  • PDF

Related Factors to Health Behavior by Patients With Hyperlipidemia Based on Health Belief Model (건강신념모형에 기초한 고지혈증 환자의 건강행태 관련요인)

  • Lee, Eun-Sun;Na, Baeg-Ju;Lee, Moo-Sik;Lee, Jin-Yong;Hong, Jee-Young;Lim, Young-Shil
    • Proceedings of the KAIS Fall Conference
    • /
    • 2011.05b
    • /
    • pp.1057-1060
    • /
    • 2011
  • 본 연구는 건강신념모형의 주요 변수와 고지혈증 환자의 건강행태와의 관계를 파악하여 고지혈증 환자의 건강행태를 촉진하고 더 나아가 만성질환 보건사업 및 교육프로그램을 계획하는데 기초 자료를 제공하고자 시도되었다. 자료는 2009년 07월부터 2010년 9월까지 총콜레스테롤이 240mg/dl 이상이고, 중성지방이 200mg/dl 이상으로 고지혈증을 진단받은 20세 이상의 성인 남녀 146명을 대상으로 구조화된 설문지를 이용하여 조사하였으며, SPSS WIN(14.0 한글판) 프로그램을 이용하여 Chronbach's alpha의 신뢰성 분석, 요인분석, 단변량 및 다변량 분석을 시행하였다. 본 연구의 결과는 다음과 같다. 첫째, 본 연구에서는 LDL-cholesterol, HDL-cholesterol, TG에 대한 인지수준 중 TG에 대한 인지가 가장 높았고, 3가지 모두를 인지한 경우는 28.08%였다. 또한 9가지 항목에 대한 고지혈증 지식수준은 9점 만점에 평균 6.51이었으며, 지식수준이 높을수록 건강행태수준도 높았다. 둘째, 요인분석을 통하여 10개의 건강행태를 2개 요인으로 재분류 하였다. 그 결과, 건강행태 요인 1은 '식이, 운동 습관 및 고지혈증 검사 및 관련 검사요인', 건강행태 요인 2는 '흡연, 음주 습관 및 고지혈증 치료 관련 요인'이었다. 건강행태 요인1에 유의한 관련성이 있는 건강신념변수는 심각성, 이득, 장애로 나타났고, 취약성은 상관 관계가 없는 것으로 나타났다. 각 신념 요인들과 건강행태 간의 상관되는 순서는 이득(r =.455), 심각성 (r=.38), 장애(r=-.244) 순으로 나타나 고지혈증에 대한 이득 인식이 건강행태 요인1과 가장 관련성이 높은 것으로 파악되었다. 그러나, 건강행태 요인2는 건강신념변수와 관련성이 없는 것으로 나타났다. 셋째, 행동계기에 따른 건강행태의 관계를 살펴보면, 교육을 받았을 때 건강행태 요인1과 요인2에 모두 유의한 차이를 보이는 것으로 나타나, 교육이 고지혈증 환자의 건강행태에 중요한 영향을 미치는 것을 보여 주었다. 넷째, 다중회귀분석 결과 고지혈증 건강행태 요인1에 영향을 미치는 요인 중 유의한 요인으로 인지된 심각성 및 이점 신념요인, 교육여부, 보건소 교육정도 이었다. 건강행태 요인2에서는 성별, 연령, 교육여부가 유의한 영향을 미치는 요인으로 나타났다. 이상의 결과를 종합하면 건강신념모형이 고지혈증 건강행태를 예측하는데 적합한 모형이라고 판단 할 수 있으며, 건강행태 요인 특성에 따라 건강신념변수 중 고지혈증 예방에 대한 이득을 높이 인식할 수 있도록 프로그램과 교육목표를 설정하면 보다 효과적인 교육이 될 것이라 생각된다.

  • PDF

Evaluation Methods of Soil Resilience Related to Agricultural Environment (농업환경 분야에서 토양 리질리언스 분야별 평가 방법)

  • Kim, Min-Suk;Min, Hyun-Gi;Hyun, Seung-Hun;Kim, Jeong-Gyu
    • Ecology and Resilient Infrastructure
    • /
    • v.7 no.2
    • /
    • pp.97-113
    • /
    • 2020
  • Soil is the foundation of human life and the basis for food security. Considering this it is prioritized in the UN's Sustainable Development Goals (SDG). Therefore, research on soil resilience in the agricultural environment is crucial for sound and sustainable soil management, especially in highly uncertain and unpredictable conditions. Soil resilience is defined in different ways by several researchers; however, its definition typically includes the concepts of recovery and resistance to stress. The physical, chemical, and biological characteristics of soils that are used to assess the soil resilience, i.e., the response of soil to various types of stress are summarized in this study. In addition, various statistical processing techniques and quantification methods are summarized considering the wide spatial and temporal scope of soil resilience research. Several soil resilience studies typically conduct the following five steps: (1) soil and site selection (2) stress (independent variable) setting (3) soil characteristics and indicator (dependent variable) setting (4) performing various spatiotemporal scale experiments (5) statistical analysis. The previous and present studies present a general introduction of soil resilience, based on which, further practical research considering domestic agricultural environment should be conducted. The extensive range of soil resilience measurements will require collaboration between researchers in various fields.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.