• Title/Summary/Keyword: 회귀나무모형

Search Result 110, Processing Time 0.025 seconds

머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구

  • Yun, Yang-Hyeon;Kim, Tae-Gyeong;Kim, Su-Yeong;Park, Yong-Gyun
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2021.11a
    • /
    • pp.185-187
    • /
    • 2021
  • 관리종목 지정 제도는 상장 기업 내 기업의 부실화를 경고하여 기업에게는 회생 기회를 주고, 투자자들에게는 투자 위험을 경고하기 위한 시장규제 제도이다. 본 연구는 관리종목과 비관리종목의 기업의 재무 데이터를 표본으로 하여 관리종목 지정 예측에 대한 연구를 진행하였다. 분석에 쓰인 분석 방법은 로지스틱 회귀분석, 의사결정나무, 서포트 벡터 머신, 소프트 보팅, 랜덤 포레스트, LightGBM이며 분류 정확도가 82.73%인 LightGBM이 가장 우수한 예측 모형이었으며 분류 정확도가 가장 낮은 예측 모형은 정확도가 71.94%인 의사결정나무였다. 대체적으로 앙상블을 이용한 학습 모형이 단일 학습 모형보다 예측 성능이 높았다.

  • PDF

Development of Multiple Linear Regression Model to Predict Agricultural Reservoir Storage based on Naive Bayes Classification and Weather Forecast Data (나이브 베이즈 분류와 기상예보자료 기반의 농업용 저수지 저수율 전망을 위한 저수율 예측 다중선형 회귀모형 개발)

  • Kim, Jin Uk;Jung, Chung Gil;Lee, Ji Wan;Kim, Seong Joon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.112-112
    • /
    • 2018
  • 최근 이상기후로 인한 국부적인 혹은 광역적인 가뭄이 빈번하게 발생하고 있는 추세이며 발생횟수 뿐 아니라 가뭄 심도 및 지속기간이 과거보다 크게 증가하여 그에 따른 피해가 커질 것으로 예측되고 있다. 특히, 2014~2015년도의 유례없는 가뭄으로 인해 저수지 용수공급이 제한되면서 많은 농가들이 피해를 입었다. 본 연구의 목적은 전국 농업용 저수지를 대상으로 기상청 3개월 예보자료를 활용 할 수 있는 농업용 저수지 저수율 다중선형 회귀 모형을 개발하여 저수율 전망정보를 생산하는 것이다. 본 연구에서는 전국에 적용 가능한 저수율 다중선형 회귀 모형개발을 위해 5개의 기상요소(강수량, 최고기온, 최저기온, 평균기온, 평균풍속)와 관측 저수지 저수율을 활용했다. 기상자료는 2002년부터 2017년까지의 기상청 63개 지상관측소로부터 기상관측자료를 수집하였다. 본 연구에서는 저수율 전망 단계를 세 단계로 나누었다. 첫 번째 단계로 농어촌공사에서 전국 511개 용수구역을 대상으로 군집분석 및 의사결정나무 분석을 통해 제시한 65개 대표저수지를 대상으로 기상자료 및 관측 저수율 자료를 이용하여 다중선형 회귀분석을 실시하였다. 수집한 기상요소와 저수율을 독립변수로 하여 월별 회귀식을 산정한 결과 결정계수($R^2$)는 0.51~0.95로 나타났다. 두 번째 단계로 대표저수지의 회귀분석 결과를 전국의 저수지로 확대하기 위해 나이브 베이즈 분류법을 적용하여 전국 3098개의 저수지를 65의 군집으로 분류하고 각각의 군집에 해당되는 월별 회귀식을 산정하였다. 마지막으로 전국 저수지로 산정된 회귀식과 농업 가뭄 예측을 위해 기상청의 GS5(Global Seasonal Forecasting System 5) 3개월 예보자료를 수집하여 회귀식에 적용해 2017년 전국 저수지의 3개월 저수율 전망정보를 생산하였다. 본 연구의 전국 저수지 군집결과 기반의 저수율 전망기술은 2017년도 관측 저수율과 비교한 결과 유의한 상관성을 나타냈으며 이 결과는 추후 농업용 저수지의 물 공급 및 농업가뭄 전망 자료로서 이용이 가능할 것으로 판단된다.

  • PDF

Comparison of Methodologies for Characterizing Pedestrian-Vehicle Collisions (보행자-차량 충돌사고 특성분석 방법론 비교 연구)

  • Choi, Saerona;Jeong, Eunbi;Oh, Cheol
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.6
    • /
    • pp.53-66
    • /
    • 2013
  • The major purpose of this study is to evaluate methodologies to predict the injury severity of pedestrian-vehicle collisions. Methodologies to be evaluated and compared in this study include Binary Logistic Regression(BLR), Ordered Probit Model(OPM), Support Vector Machine(SVM) and Decision Tree(DT) method. Valuable insights into applying methodologies to analyze the characteristics of pedestrian injury severity are derived. For the purpose of identifying causal factors affecting the injury severity, statistical approaches such as BLR and OPM are recommended. On the other hand, to achieve better prediction performance, heuristic approaches such as SVM and DT are recommended. It is expected that the outcome of this study would be useful in developing various countermeasures for enhancing pedestrian safety.

A Study for the Development of a Bid Price Rate Prediction Model (낙찰률 예측 모형에 관한 연구)

  • Choi, Bo-Seung;Kang, Hyun-Cheol;Han, Sang-Tae
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.23-34
    • /
    • 2011
  • Property auctions have become a new method for real estate investment because the property auction market grows in tandem with the growth of the real estate market. This study focused on the statistical model for predicting bid price rates which is the main index for participants in the real estate auction market. For estimating the monthly bid price rate, we proposed a new method to make up for the mean of regions and terms as well as to reduce the prediction error using a decision tree analysis. We also proposed a linear regression model to predict a bid price rate for individual auction property. We applied the proposed model to apartment auction property and tried to predict the bid price rate as well as categorize individual auction property into an auction grade.

A Study on Regional Variations for Disease-specific Cardiac Arrest (질환성 심정지 발생의 지역별 변이에 관한 연구)

  • Park, Il-Su;Kim, Eun-Ju;Kim, Yoo-Mi;Hong, Sung-Ok;Kim, Young-Taek;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.1
    • /
    • pp.353-366
    • /
    • 2015
  • The purpose of this study was to examine how region-specific characteristics affect the occurrence of cardiac arrest. To analyze, we combined a unique data set including key indicators of health condition and cardiac arrest occurrence at the 244 small administrative districts. Our data came from two main sources in Korea Center For Disease Control and Prevention (KCDC): 2010 Out-of-Hospital Cardiac Arrest Surveillance and Community Health Survey. We analyzed data by using multiple regression, geographically weighted regression and decision tree. Decision tree model is selected as the final model to explain regional variations of cardiac arrest. Factors of regional variations of cardiac arrest occurrence are population density, diagnosis rates of hypertension, stress level, participating screening level, high drinking rate, and smoking rate. Taken as a whole, accounting for geographical variations of health conditions, health behaviors and other socioeconomic factors are important when regionally customized health policy is implemented to decrease the cardiac arrest occurrence.

An Analysis for Price Determinants of Small and Medium-sized Office Buildings Using Data Mining Method in Gangnam-gu (데이터마이닝기법을 활용한 강남구 중소형 오피스빌딩의 매매가격 결정요인 분석)

  • Mun, Keun-Sik;Choi, Jae-Gyu;Lee, Hyun-seok
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.414-427
    • /
    • 2015
  • Most Studies for office market have focused on large-scale office buildings. There is, if any, a little research for small and medium-sized office buildings due to the lack of data. This study uses the self-searched and established 1,056 data in Gangnam-Gu, and estimates the data by not only linear regression model, but also data mining methods. The results provide investors with various information of price determinants, for small and medium-sized office buildings, comparing with large-scale office buildings. The important variables are street frontage condition, zoning of commercial area, distance to subway station, and so on.

Unit Nonresponse Weighting Adjustment Using Regression Tree (회귀나무를 이용한 무응답 가중치 조정)

  • Kim, Se-Mi;Lee, Seok-Hun
    • Proceedings of the Korean Association for Survey Research Conference
    • /
    • 2005.12a
    • /
    • pp.169-183
    • /
    • 2005
  • This paper considers formation of nonresponse weighting adjustment cell for handling unit nonresponse in sample surveys. We propose a multivariate regression tree mehtod for segmentation using the variable of interest and the estimated response probability simultaneously to construct effective nonresponse adjustment cell. One is using only response data and the other is using response and nonresponse data. These two cases are compared in terms of bias.

  • PDF

The influence analysis of admission variables on academic achievements (학업성취도에 대한 대입전형 요인들의 영향력 분석)

  • Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.4
    • /
    • pp.729-736
    • /
    • 2010
  • In this paper, we study the influence analysis of admission variables including their characteristics on academic achievements of freshmen at K university in Busan. First, multiple regression analysis is used to examine the main effects of admission variables including students' characteristics on the academic achievements. Also, Decision tree analysis is used to examine the interaction effects for the admission variables on the academic achievements. The results of this paper may be helpful to K university in designing effective admissions strategies for recruiting students.

A Prediction Model of Timely Processing on Medical Service using Classification and Regression Tree (분류회귀나무를 이용한 의료서비스 적기처리 예측모형)

  • Lee, Jong-Chan;Jeong, Seung-Woo;Lee, Won-Young
    • Journal of IKEEE
    • /
    • v.20 no.1
    • /
    • pp.16-25
    • /
    • 2016
  • Turnaround time (called, TAT) for imaging test, which is necessary for making a medical diagnosis, is directly related to the patient's waiting time and it is one of the important performance criteria for medical services. In this paper, we measured the TAT from major imaging tests to see it met the reference point set by the medical institutions. Prediction results from the algorithm of classification regression tree (called, CART) showed "clinics", "diagnosis", "modality", "test month" were identified as main factors for timely processing. This study had a contribution in providing means of prevention of the delay on medical services in advance.

An application to Multivariate Zero-Inflated Poisson Regression Model

  • Kim, Kyung-Moo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.177-186
    • /
    • 2003
  • The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the correlated response variables are intrested, we have to extend the univariate zero-inflated regression model to multivariate model. In this paper, we study and simulate the multivariate zero-inflated regression model. A real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of multivariate zero-inflated Poisson regression model with the decision tree model.

  • PDF