• Title/Summary/Keyword: extreme learning machine

Search Result 155, Processing Time 0.025 seconds

Prediction and Analysis of PM2.5 Concentration in Seoul Using Ensemble-based Model (앙상블 기반 모델을 이용한 서울시 PM2.5 농도 예측 및 분석)

  • Ryu, Minji;Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1191-1205
    • /
    • 2022
  • Particulate matter(PM) among air pollutants with complex and widespread causes is classified according to particle size. Among them, PM2.5 is very small in size and can cause diseases in the human respiratory tract or cardiovascular system if inhaled by humans. In order to prepare for these risks, state-centered management and preventable monitoring and forecasting are important. This study tried to predict PM2.5 in Seoul, where high concentrations of fine dust occur frequently, using two ensemble models, random forest (RF) and extreme gradient boosting (XGB) using 15 local data assimilation and prediction system (LDAPS) weather-related factors, aerosol optical depth (AOD) and 4 chemical factors as independent variables. Performance evaluation and factor importance evaluation of the two models used for prediction were performed, and seasonal model analysis was also performed. As a result of prediction accuracy, RF showed high prediction accuracy of R2 = 0.85 and XGB R2 = 0.91, and it was confirmed that XGB was a more suitable model for PM2.5 prediction than RF. As a result of the seasonal model analysis, it can be said that the prediction performance was good compared to the observed values with high concentrations in spring. In this study, PM2.5 of Seoul was predicted using various factors, and an ensemble-based PM2.5 prediction model showing good performance was constructed.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Monitoring Ground-level SO2 Concentrations Based on a Stacking Ensemble Approach Using Satellite Data and Numerical Models (위성 자료와 수치모델 자료를 활용한 스태킹 앙상블 기반 SO2 지상농도 추정)

  • Choi, Hyunyoung;Kang, Yoojin;Im, Jungho;Shin, Minso;Park, Seohui;Kim, Sang-Min
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1053-1066
    • /
    • 2020
  • Sulfur dioxide (SO2) is primarily released through industrial, residential, and transportation activities, and creates secondary air pollutants through chemical reactions in the atmosphere. Long-term exposure to SO2 can result in a negative effect on the human body causing respiratory or cardiovascular disease, which makes the effective and continuous monitoring of SO2 crucial. In South Korea, SO2 monitoring at ground stations has been performed, but this does not provide spatially continuous information of SO2 concentrations. Thus, this research estimated spatially continuous ground-level SO2 concentrations at 1 km resolution over South Korea through the synergistic use of satellite data and numerical models. A stacking ensemble approach, fusing multiple machine learning algorithms at two levels (i.e., base and meta), was adopted for ground-level SO2 estimation using data from January 2015 to April 2019. Random forest and extreme gradient boosting were used as based models and multiple linear regression was adopted for the meta-model. The cross-validation results showed that the meta-model produced the improved performance by 25% compared to the base models, resulting in the correlation coefficient of 0.48 and root-mean-square-error of 0.0032 ppm. In addition, the temporal transferability of the approach was evaluated for one-year data which were not used in the model development. The spatial distribution of ground-level SO2 concentrations based on the proposed model agreed with the general seasonality of SO2 and the temporal patterns of emission sources.

Computational estimation of the earthquake response for fibre reinforced concrete rectangular columns

  • Liu, Chanjuan;Wu, Xinling;Wakil, Karzan;Jermsittiparsert, Kittisak;Ho, Lanh Si;Alabduljabbar, Hisham;Alaskar, Abdulaziz;Alrshoudi, Fahed;Alyousef, Rayed;Mohamed, Abdeliazim Mustafa
    • Steel and Composite Structures
    • /
    • v.34 no.5
    • /
    • pp.743-767
    • /
    • 2020
  • Due to the impressive flexural performance, enhanced compressive strength and more constrained crack propagation, Fibre-reinforced concrete (FRC) have been widely employed in the construction application. Majority of experimental studies have focused on the seismic behavior of FRC columns. Based on the valid experimental data obtained from the previous studies, the current study has evaluated the seismic response and compressive strength of FRC rectangular columns while following hybrid metaheuristic techniques. Due to the non-linearity of seismic data, Adaptive neuro-fuzzy inference system (ANFIS) has been incorporated with metaheuristic algorithms. 317 different datasets from FRC column tests has been applied as one database in order to determine the most influential factor on the ultimate strengths of FRC rectangular columns subjected to the simulated seismic loading. ANFIS has been used with the incorporation of Particle Swarm Optimization (PSO) and Genetic algorithm (GA). For the analysis of the attained results, Extreme learning machine (ELM) as an authentic prediction method has been concurrently used. The variable selection procedure is to choose the most dominant parameters affecting the ultimate strengths of FRC rectangular columns subjected to simulated seismic loading. Accordingly, the results have shown that ANFIS-PSO has successfully predicted the seismic lateral load with R2 = 0.857 and 0.902 for the test and train phase, respectively, nominated as the lateral load prediction estimator. On the other hand, in case of compressive strength prediction, ELM is to predict the compressive strength with R2 = 0.657 and 0.862 for test and train phase, respectively. The results have shown that the seismic lateral force trend is more predictable than the compressive strength of FRC rectangular columns, in which the best results belong to the lateral force prediction. Compressive strength prediction has illustrated a significant deviation above 40 Mpa which could be related to the considerable non-linearity and possible empirical shortcomings. Finally, employing ANFIS-GA and ANFIS-PSO techniques to evaluate the seismic response of FRC are a promising reliable approach to be replaced for high cost and time-consuming experimental tests.

Feature Extraction Algorithm for Distant Unmmaned Aerial Vehicle Detection (원거리 무인기 신호 식별을 위한 특징추출 알고리즘)

  • Kim, Juho;Lee, Kibae;Bae, Jinho;Lee, Chong Hyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.3
    • /
    • pp.114-123
    • /
    • 2016
  • The effective feature extraction method for unmanned aerial vehicle (UAV) detection is proposed and verified in this paper. The UAV engine sound is harmonic complex tone whose frequency ratio is integer and its variation is continuous in time. Using these characteristic, we propose the feature vector composed of a mean and standard deviation of difference value between fundamental frequency with 1st overtone as well as mean variation of their frequency. It was revealed by simulation that the suggested feature vector has excellent discrimination in target signal identification from various interfering signals including frequency variation with time. By comparing Fisher scores, three features based on frequency show outstanding discrimination of measured UAV signals with low signal to noise ratio (SNR). Detection performance with simulated interference signal is compared by MFCC by using ELM classifier and the suggested feature vector shows 37.6% of performance improvement As the SNR increases with time, the proposed feature can detect the target signal ahead of MFCC that needs 4.5 dB higher signal power to detect the target.