• Title/Summary/Keyword: extreme value prediction

Search Result 59, Processing Time 0.021 seconds

A Study on the Application of Generalized Extreme Value Distribution to the Variation of Annual Maximum Surge Heights (연간 최대해일고 변동의 일반화 극치분포 적용 연구)

  • Kwon, Seok-Jae;Park, Jeong-Soo;Lee, Eun-Il
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.21 no.3
    • /
    • pp.241-253
    • /
    • 2009
  • This study performs the investigation of a long-term variation of annual maximum surge heights(AMSH) and main characteristics of high surge events, and the statistical evaluation of the AMSH using sea level data at Yeosu and Tongyeong tidal stations over more than 30 years. It is found that the long-term uptrends based on the linear regression in the AMSH are 34.5 cm/34 yr at Yeosu and 33.6 cm/31 yr at Tongyeong, which are relatively much higher than those at Sokcho and Mukho in the Eastern Coast. 71% and 68% of the AMSH occur during typhoon's event in Yeosu and Tongyeong tidal stations, respectively, and the highest surge records are mostly produced by the typhoon. The generalized extreme value distribution taking into account of the time variable is applied to detect time trend in annual maximum surge heights. In addition, Gumbel distribution is checked to find which one is best fitted to the data using likelihood ratio test. The return level and its 90% confidence interval are obtained for the statistical prediction of the future trend. The prevention of the growing storm surge damage by the intensified typhoon requires the steady analysis and prediction of the surge events associated with the climate change.

Prediction of the direction of stock prices by machine learning techniques (기계학습을 활용한 주식 가격의 이동 방향 예측)

  • Kim, Yonghwan;Song, Seongjoo
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.745-760
    • /
    • 2021
  • Prediction of a stock price has been a subject of interest for a long time in financial markets, and thus, many studies have been conducted in various directions. As the efficient market hypothesis introduced in the 1970s acquired supports, it came to be the majority opinion that it was impossible to predict stock prices. However, recent advances in predictive models have led to new attempts to predict the future prices. Here, we summarize past studies on the price prediction by evaluation measures, and predict the direction of stock prices of Samsung Electronics, LG Chem, and NAVER by applying various machine learning models. In addition to widely used technical indicator variables, accounting indicators such as Price Earning Ratio and Price Book-value Ratio and outputs of the hidden Markov Model are used as predictors. From the results of our analysis, we conclude that no models show significantly better accuracy and it is not possible to predict the direction of stock prices with models used. Considering that the models with extra predictors show relatively high test accuracy, we may expect the possibility of a meaningful improvement in prediction accuracy if proper variables that reflect the opinions and sentiments of investors would be utilized.

Heat-Wave Data Analysis based on the Zero-Inflated Regression Models (영-과잉 회귀모형을 활용한 폭염자료분석)

  • Kim, Seong Tae;Park, Man Sik
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2829-2840
    • /
    • 2018
  • The random variable with an arbitrary value or more is called semi-continuous variable or zero-inflated one in case that its boundary value is more frequently observed than expected. This means the boundary value is likely to be practically observed more than it should be theoretically under certain probability distribution. When the distribution considered is continuous, the variable is defined as semi-continuous and when one of discrete distribution is assumed for the variable, we regard it as zero-inflated. In this study, we introduce the two-part model, which consists of one part for modelling the binary response and the other part for modelling the variable greater than the boundary value. Especially, the zero-inflated regression models are explained by using Poisson distribution and negative binomial distribution. In real data analysis, we employ the zero-inflated regression models to estimate the number of days under extreme heat-wave circumstances during the last 10 years in South Korea. Based on the estimation results, we create prediction maps for the estimated number of days under heat-wave advisory and heat-wave warning by using the universal kriging, which is one of the spatial prediction methods.

Comparative Analysis of Regional and At-site Analysis for the Design Rainfall by Gamma and Non-Gamma Family (Ⅱ) (Gamma 및 비Gamma군 분포모형에 의한 강우의 지점 및 지역빈도 비교분석 (Ⅱ))

  • Lee , Soon-Hyuk;Ryoo, Kyong-Sik
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.46 no.5
    • /
    • pp.15-26
    • /
    • 2004
  • This study was conducted to derive the regional design rainfall by the regional frequency analysis based on the regionalization of the precipitation. The optimal regionalization of the precipitation data were classified by the above mentioned regionalization for all over the regions except Jeju and Ulleung islands in Korea. Design rainfalls following the consecutive duration were derived by the regional analysis using the observed and simulated data resulted from Monte Carlo techniques. Relative root mean square error (RRMSE), relative bias (RBIAS) and relative reduction (RR) in RRMSE for the design rainfall were computed and compared between the regional and at-site frequency analysis. It has shown that the regional frequency analysis procedure can substantially more reduce the RRMSE, RBIAS and RR in RRMSE than those of at-site analysis in the prediction of design rainfall. Consequently, optimal design rainfalls following the classified regions and consecutive durations were derived by the regional frequency analysis using Generalized extreme value distribution which was identified to be more optimal one than the other applied distributions. Diagrams for the design rainfall derived by the regional frequency analysis using L-moments were drawn according to the regions and consecutive durations by GIS techniques.

An advanced technique to predict time-dependent corrosion damage of onshore, offshore, nearshore and ship structures: Part I = generalisation

  • Kim, Do Kyun;Wong, Eileen Wee Chin;Cho, Nak-Kyun
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.12 no.1
    • /
    • pp.657-666
    • /
    • 2020
  • A reliable and cost-effective technique for the development of corrosion damage model is introduced to predict nonlinear time-dependent corrosion wastage of steel structures. A detailed explanation on how to propose a generalised mathematical formulation of the corrosion model is investigated in this paper (Part I), and verification and application of the developed method are covered in the following paper (Part II) by adopting corrosion data of a ship's ballast tank structure. In this study, probabilistic approaches including statistical analysis were applied to select the best fit probability density function (PDF) for the measured corrosion data. The sub-parameters of selected PDF, e.g., the largest extreme value distribution consisting of scale, and shape parameters, can be formulated as a function of time using curve fitting method. The proposed technique to formulate the refined time-dependent corrosion wastage model (TDCWM) will be useful for engineers as it provides an easy and accurate prediction of the 1) starting time of corrosion, 2) remaining life of the structure, and 3) nonlinear corrosion damage amount over time. In addition, the obtained outcome can be utilised for the development of simplified engineering software shown in Appendix B.

Application of artificial neural network model in regional frequency analysis: Comparison between quantile regression and parameter regression techniques.

  • Lee, Joohyung;Kim, Hanbeen;Kim, Taereem;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.170-170
    • /
    • 2020
  • Due to the development of technologies, complex computation of huge data set is possible with a prevalent personal computer. Therefore, machine learning methods have been widely applied in the hydrologic field such as regression-based regional frequency analysis (RFA). The main purpose of this study is to compare two frameworks of RFA based on the artificial neural network (ANN) models: quantile regression technique (QRT-ANN) and parameter regression technique (PRT-ANN). As an output layer of the ANN model, the QRT-ANN predicts quantiles for various return periods whereas the PRT-ANN provides prediction of three parameters for the generalized extreme value distribution. Rainfall gauging sites where record length is more than 20 years were selected and their annual maximum rainfalls and various hydro-meteorological variables were used as an input layer of the ANN model. While employing the ANN model, 70% and 30% of gauging sites were used as training set and testing set, respectively. For each technique, ANN model structure such as number of hidden layers and nodes was determined by a leave-one-out validation with calculating root mean square error (RMSE). To assess the performances of two frameworks, RMSEs of quantile predicted by the QRT-ANN are compared to those of the PRT-ANN.

  • PDF

Comparative Evaluation of Reproducibility for Spatio-temporal Rainfall Distribution Downscaled Using Different Statistical Methods (통계적 공간상세화 기법의 시공간적 강우분포 재현성 비교평가)

  • Jung, Imgook;Hwang, Syewoon;Cho, Jaepil
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.65 no.1
    • /
    • pp.1-13
    • /
    • 2023
  • Various techniques for bias correction and statistical downscaling have been developed to overcome the limitations related to the spatial and temporal resolution and error of climate change scenario data required in various applied research fields including agriculture and water resources. In this study, the characteristics of three different statistical dowscaling methods (i.e., SQM, SDQDM, and BCSA) provided by AIMS were summarized, and climate change scenarios produced by applying each method were comparatively evaluated. In order to compare the average rainfall characteristics of the past period, an index representing the average rainfall characteristics was used, and the reproducibility of extreme weather conditions was evaluated through the abnormal climate-related index. The reproducibility comparison of spatial distribution and variability was compared through variogram and pattern identification of spatial distribution using the average value of the index of the past period. For temporal reproducibility comparison, the raw data and each detailing technique were compared using the transition probability. The results of the study are presented by quantitatively evaluating the strengths and weaknesses of each method. Through comparison of statistical techniques, we expect that the strengths and weaknesses of each detailing technique can be represented, and the most appropriate statistical detailing technique can be advised for the relevant research.

Development of Tree Detection Methods for Estimating LULUCF Settlement Greenhouse Gas Inventories Using Vegetation Indices (식생지수를 활용한 LULUCF 정주지 온실가스 인벤토리 산정을 위한 수목탐지 방법 개발)

  • Joon-Woo Lee;Yu-Han Han;Jeong-Taek Lee;Jin-Hyuk Park;Geun-Han Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_3
    • /
    • pp.1721-1730
    • /
    • 2023
  • As awareness of the problem of global warming emerges around the world, the role of carbon sinks in settlement is increasingly emphasized to achieve carbon neutrality in urban areas. In order to manage carbon sinks in settlement, it is necessary to identify the current status of carbon sinks. Identifying the status of carbon sinks requires a lot of manpower and time and a corresponding budget. Therefore, in this study, a map predicting the location of trees was created using already established tree location information and Sentinel-2 satellite images targeting Seoul. To this end, after constructing a tree presence/absence dataset, structured data was generated using 16 types of vegetation indices information constructed from satellite images. After learning this by applying the Extreme Gradient Boosting (XGBoost) model, a tree prediction map was created. Afterward, the correlation between independent and dependent variables was investigated in model learning using the Shapely value of Shapley Additive exPlanations(SHAP). A comparative analysis was performed between maps produced for local parts of Seoul and sub-categorized land cover maps. In the case of the tree prediction model produced in this study, it was confirmed that even hard-to-detect street trees around the main street were predicted as trees.

Prediction of spatio-temporal AQI data

  • KyeongEun Kim;MiRu Ma;KyeongWon Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.2
    • /
    • pp.119-133
    • /
    • 2023
  • With the rapid growth of the economy and fossil fuel consumption, the concentration of air pollutants has increased significantly and the air pollution problem is no longer limited to small areas. We conduct statistical analysis with the actual data related to air quality that covers the entire of South Korea using R and Python. Some factors such as SO2, CO, O3, NO2, PM10, precipitation, wind speed, wind direction, vapor pressure, local pressure, sea level pressure, temperature, humidity, and others are used as covariates. The main goal of this paper is to predict air quality index (AQI) spatio-temporal data. The observations of spatio-temporal big datasets like AQI data are correlated both spatially and temporally, and computation of the prediction or forecasting with dependence structure is often infeasible. As such, the likelihood function based on the spatio-temporal model may be complicated and some special modelings are useful for statistically reliable predictions. In this paper, we propose several methods for this big spatio-temporal AQI data. First, random effects with spatio-temporal basis functions model, a classical statistical analysis, is proposed. Next, neural networks model, a deep learning method based on artificial neural networks, is applied. Finally, random forest model, a machine learning method that is closer to computational science, will be introduced. Then we compare the forecasting performance of each other in terms of predictive diagnostics. As a result of the analysis, all three methods predicted the normal level of PM2.5 well, but the performance seems to be poor at the extreme value.

Regional Frequency Analysis for Rainfall using L-Moment (L-모멘트법에 의한 강우의 지역빈도분석)

  • Koh, Deuk-Koo;Choo, Tai-Ho;Maeng, Seung-Jin;Trivedi, Chanda
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.3
    • /
    • pp.252-263
    • /
    • 2008
  • This study was conducted to derive the optimal regionalization of the precipitation data which can be classified on the basis of climatologically and geographically homogeneous regions all over the regions except Cheju and Ulreung islands in Korea. A total of 65 rain gauges were used to regional analysis of precipitation. Annual maximum series for the consecutive durations of 1, 3, 6, 12, 24, 36, 48 and 72hr were used for various statistical analyses. K-means clustering mettled is used to identify homogeneous regions all over the regions. Five homogeneous regions for the precipitation were classified by the K-means clustering. Using the L-moment ratios and Kolmogorov-Smirnov test, the underlying regional probability distribution was identified to be the generalized extreme value (GEV) distribution among applied distributions. The regional and at-site parameters of the generalized extreme value distribution were estimated by the linear combination of the probability weighted moments, L-moment. The regional and at-site analysis for the design rainfall were tested by Monte Carlo simulation. Relative root-mean-square error (RRMSE), relative bias (RBIAS) and relative reduction (RR) in RRMSE were computed and compared with those resulting from at-site Monte Carlo simulation. All show that the regional analysis procedure can substantially reduce the RRMSE, RBIAS and RR in RRMSE in the prediction of design rainfall. Consequently, optimal design rainfalls following the regions and consecutive durations were derived by the regional frequency analysis.