• Title/Summary/Keyword: 랜덤 포레스트 대체

Search Result 6, Processing Time 0.03 seconds

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

Predicting Photovoltaic Power Generation with Random Forests (랜덤 포레스트를 이용한 태양광 발전량 예측)

  • Lee, Woonghee;Kim, Younghoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.397-400
    • /
    • 2016
  • 태양광 발전 방식은 기존 고갈 가능성이 있는 에내지를 대체하기 위해 많은 개발이 이루어져왔다. 태양광 발전 모듈의 인버터에는 발전량에 영향을 주는 다양한 속성들이 계측되어 저장된다. 본 연구에서는 이런 데이터에, 발전량에 영향을 주는 외부 요인인 기상 데이터를 추가하고, 랜덤 포레스트를 써서 과거 몇일까지의 데이터를 고려했을 때 가장 예측 성능이 높은지 실험을 통해 검증하였다. 2일 전부터 최대 365일 전까지의 데이터를 고려한 결과 5일 정도의 과거 데이터를 고려했을 때 예측 성능이 가장 높고, 고려하는 기간이 길어질수록 예측 성능이 떨어지는 경향을 보였다.

머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구

  • Yun, Yang-Hyeon;Kim, Tae-Gyeong;Kim, Su-Yeong;Park, Yong-Gyun
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2021.11a
    • /
    • pp.185-187
    • /
    • 2021
  • 관리종목 지정 제도는 상장 기업 내 기업의 부실화를 경고하여 기업에게는 회생 기회를 주고, 투자자들에게는 투자 위험을 경고하기 위한 시장규제 제도이다. 본 연구는 관리종목과 비관리종목의 기업의 재무 데이터를 표본으로 하여 관리종목 지정 예측에 대한 연구를 진행하였다. 분석에 쓰인 분석 방법은 로지스틱 회귀분석, 의사결정나무, 서포트 벡터 머신, 소프트 보팅, 랜덤 포레스트, LightGBM이며 분류 정확도가 82.73%인 LightGBM이 가장 우수한 예측 모형이었으며 분류 정확도가 가장 낮은 예측 모형은 정확도가 71.94%인 의사결정나무였다. 대체적으로 앙상블을 이용한 학습 모형이 단일 학습 모형보다 예측 성능이 높았다.

  • PDF

Comparison of Machine Learning Techniques in Urban Weather Prediction using Air Quality Sensor Data (실외공기측정기 자료를 이용한 도심 기상 예측 기계학습 모형 비교)

  • Jong-Chan Park;Heon Jin Park
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.39-49
    • /
    • 2021
  • Recently, large and diverse weather data are being collected by sensors from various sources. Efforts to predict the concentration of fine dust through machine learning are being made everywhere, and this study intends to compare PM10 and PM2.5 prediction models using data from 840 outdoor air meters installed throughout the city. Information can be provided in real time by predicting the concentration of fine dust after 5 minutes, and can be the basis for model development after 10 minutes, 30 minutes, and 1 hour. Data preprocessing was performed, such as noise removal and missing value replacement, and a derived variable that considers temporal and spatial variables was created. The parameters of the model were selected through the response surface method. XGBoost, Random Forest, and Deep Learning (Multilayer Perceptron) are used as predictive models to check the difference between fine dust concentration and predicted values, and to compare the performance between models.

Status of Groundwater Potential Mapping Research Using GIS and Machine Learning (GIS와 기계학습을 이용한 지하수 가능성도 작성 연구 현황)

  • Lee, Saro;Fetemeh, Rezaie
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1277-1290
    • /
    • 2020
  • Water resources which is formed of surface and groundwater, are considered as one of the pivotal natural resources worldwide. Since last century, the rapid population growth as well as accelerated industrialization and explosive urbanization lead to boost demand for groundwater for domestic, industrial and agricultural use. In fact, better management of groundwater can play crucial role in sustainable development; therefore, determining accurate location of groundwater based groundwater potential mapping is indispensable. In recent years, integration of machine learning techniques, Geographical Information System (GIS) and Remote Sensing (RS) are popular and effective methods employed for groundwater potential mapping. For determining the status of the integrated approach, a systematic review of 94 directly relevant papers were carried out over the six previous years (2015-2020). According to the literature review, the number of studies published annually increased rapidly over time. The total study area spanned 15 countries, and 85.1% of studies focused on Iran, India, China, South Korea, and Iraq. 20 variables were found to be frequently involved in groundwater potential investigations, of which 9 factors are almost always present namely slope, lithology (geology), land use/land cover (LU/LC), drainage/river density, altitude (elevation), topographic wetness index (TWI), distance from river, rainfall, and aspect. The data integration was carried random forest, support vector machine and boost regression tree among the machine learning techniques. Our study shows that for optimal results, groundwater mapping must be used as a tool to complement field work, rather than a low-cost substitute. Consequently, more study should be conducted to enhance the generalization and precision of groundwater potential map.

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.