• Title/Summary/Keyword: Generalized additive models

Search Result 43, Processing Time 0.024 seconds

Oceanographic indicators for the occurrence of anchovy eggs inferred from generalized additive models

  • Kim, Jin Yeong;Lee, Jae Bong;Suh, Young-Sang
    • Fisheries and Aquatic Sciences
    • /
    • v.23 no.7
    • /
    • pp.19.1-19.14
    • /
    • 2020
  • Three generalized additive models were applied to the distribution of anchovy eggs and oceanographic factors to determine the occurrence of anchovy spawning grounds in Korean waters and to identify the indicators of their occurrence using survey data from the spring and summer of 1985, 1995, and 2002. Binomial and Gaussian types of generalized additive models (GAM) and quantile generalized additive models (QGAM) revealed that egg density was influenced mostly by ocean temperature and salinity in spring, and the vertical structure of temperature, salinity, dissolved oxygen, and zooplankton biomass during summer in the upper quantiles of egg density. The GAM and QGAM model deviance explained 18.5-63.2% of the egg distribution in summer in the East and West Sea. For the principle component analysis-based GAMs, the variance explained by the final regression model was 27.3-67.0%, higher than the regular models and QGAMs for egg density in the East and West Sea. By analyzing the distribution of anchovy eggs off the Korean coast, our results revealed the optimal temperature and salinity conditions, in addition to high production and high vertical mixing, as the key indicators of the major spawning grounds of anchovies.

Comparison Studies of Hybrid and Non-hybrid Forecasting Models for Seasonal and Trend Time Series Data (트렌드와 계절성을 가진 시계열에 대한 순수 모형과 하이브리드 모형의 비교 연구)

  • Jeong, Chulwoo;Kim, Myung Suk
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.1-17
    • /
    • 2013
  • In this article, several types of hybrid forecasting models are suggested. In particular, hybrid models using the generalized additive model (GAM) are newly suggested as an alternative to those using neural networks (NN). The prediction performances of various hybrid and non-hybrid models are evaluated using simulated time series data. Five different types of seasonal time series data related to an additive or multiplicative trend are generated over different levels of noise, and applied to the forecasting evaluation. For the simulated data with only seasonality, the autoregressive (AR) model and the hybrid AR-AR model performed equivalently very well. On the other hand, if the time series data employed a trend, the SARIMA model and some hybrid SARIMA models equivalently outperformed the others. In the comparison of GAMs and NNs, regarding the seasonal additive trend data, the SARIMA-GAM evenly performed well across the full range of noise variation, whereas the SARIMA-NN showed good performance only when the noise level was trivial.

Tuning the Architecture of Support Vector Machine: The Case of Bankruptcy Prediction

  • Min, Jae-H.;Jeong, Chul-Woo;Kim, Myung-Suk
    • Management Science and Financial Engineering
    • /
    • v.17 no.1
    • /
    • pp.19-43
    • /
    • 2011
  • Tuning the architecture of SVM (support vector machine) is to build an SVM model of better performance. Two different tuning methods of the grid search and the GA (genetic algorithm) have been addressed in the literature, each of which has its own methodological pros and cons. This paper suggests a combined method for tuning the architecture of SVM models, which employs the GAM (generalized additive models), the grid search, and the GA in sequence. The GAM is used for selecting input variables, and the grid search and the GA are employed for finding optimal parameter values of the SVM models. Applying the method to a bankruptcy prediction problem, we show that SVM model tuned by the proposed method outperforms other SVM models.

A credit classification method based on generalized additive models using factor scores of mixtures of common factor analyzers (공통요인분석자혼합모형의 요인점수를 이용한 일반화가법모형 기반 신용평가)

  • Lim, Su-Yeol;Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.235-245
    • /
    • 2012
  • Logistic discrimination is an useful statistical technique for quantitative analysis of financial service industry. Especially it is not only easy to be implemented, but also has good classification rate. Generalized additive model is useful for credit scoring since it has the same advantages of logistic discrimination as well as accounting ability for the nonlinear effects of the explanatory variables. It may, however, need too many additive terms in the model when the number of explanatory variables is very large and there may exist dependencies among the variables. Mixtures of factor analyzers can be used for dimension reduction of high-dimensional feature. This study proposes to use the low-dimensional factor scores of mixtures of factor analyzers as the new features in the generalized additive model. Its application is demonstrated in the classification of some real credit scoring data. The comparison of correct classification rates of competing techniques shows the superiority of the generalized additive model using factor scores.

A Study on Applying Shrinkage Method in Generalized Additive Model (일반화가법모형에서 축소방법의 적용연구)

  • Ki, Seung-Do;Kang, Kee-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.207-218
    • /
    • 2010
  • Generalized additive model(GAM) is the statistical model that resolves most of the problems existing in the traditional linear regression model. However, overfitting phenomenon can be aroused without applying any method to reduce the number of independent variables. Therefore, variable selection methods in generalized additive model are needed. Recently, Lasso related methods are popular for variable selection in regression analysis. In this research, we consider Group Lasso and Elastic net models for variable selection in GAM and propose an algorithm for finding solutions. We compare the proposed methods via Monte Carlo simulation and applying auto insurance data in the fiscal year 2005. lt is shown that the proposed methods result in the better performance.

Comparison of Regression Models for Estimating Ventilation Rate of Mechanically Ventilated Swine Farm (강제환기식 돈사의 환기량 추정을 위한 회귀모델의 비교)

  • Jo, Gwanggon;Ha, Taehwan;Yoon, Sanghoo;Jang, Yuna;Jung, Minwoong
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.62 no.1
    • /
    • pp.61-70
    • /
    • 2020
  • To estimate the ventilation volume of mechanically ventilated swine farms, various regression models were applied, and errors were compared to select the regression model that can best simulate actual data. Linear regression, linear spline, polynomial regression (degrees 2 and 3), logistic curve, generalized additive model (GAM), and gompertz curve were compared. Overfitting models were excluded even when the error rate was small. The evaluation criteria were root mean square error (RMSE) and mean absolute percentage error (MAPE). The evaluation results indicated that degree 3 exhibited the lowest error rate; however, an overestimation contradiction was observed in a certain section. The logistic curve was the most stable and superior to all the models. In the estimation of ventilation volume by all of the models, the estimated ventilation volume of the logistic curve was the smallest except for the model with a large error rate and the overestimated model.

Production of Agrometeorological Information in Onion Fields using Geostatistical Models (지구 통계 모형을 이용한 양파 재배지 농업기상정보 생성 방법)

  • Im, Jieun;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.27 no.7
    • /
    • pp.509-518
    • /
    • 2018
  • Weather is the most influential factor for crop cultivation. Weather information for cultivated areas is necessary for growth and production forecasting of agricultural crops. However, there are limitations in the meteorological observations in cultivated areas because weather equipment is not installed. This study tested methods of predicting the daily mean temperature in onion fields using geostatistical models. Three models were considered: inverse distance weight method, generalized additive model, and Bayesian spatial linear model. Data were collected from the AWS (automatic weather system), ASOS (automated synoptic observing system), and an agricultural weather station between 2013 and 2016. To evaluate the prediction performance, data from AWS and ASOS were used as the modeling data, and data from the agricultural weather station were used as the validation data. It was found that the Bayesian spatial linear regression performed better than other models. Consequently, high-resolution maps of the daily mean temperature of Jeonnam were generated using all observed weather information.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Application of machine learning models for estimating house price (단독주택가격 추정을 위한 기계학습 모형의 응용)

  • Lee, Chang Ro;Park, Key Ho
    • Journal of the Korean Geographical Society
    • /
    • v.51 no.2
    • /
    • pp.219-233
    • /
    • 2016
  • In social science fields, statistical models are used almost exclusively for causal explanation, and explanatory modeling has been a mainstream until now. In contrast, predictive modeling has been rare in the fields. Hence, we focus on constructing the predictive non-parametric model, instead of the explanatory model. Gangnam-gu, Seoul was chosen as a study area and we collected single-family house sales data sold between 2011 and 2014. We applied non-parametric models proposed in machine learning area including generalized additive model(GAM), random forest, multivariate adaptive regression splines(MARS) and support vector machines(SVM). Models developed recently such as MARS and SVM were found to be superior in predictive power for house price estimation. Finally, spatial autocorrelation was accounted for in the non-parametric models additionally, and the result showed that their predictive power was enhanced further. We hope that this study will prompt methodology for property price estimation to be extended from traditional parametric models into non-parametric ones.

  • PDF

Comparison of Species Distribution Models According to Location Data (위치자료의 종류에 따른 생물종 분포모형 비교 연구)

  • Seo, Chang-Wan;Park, Yu-Ri;Choi, Yun-Soo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.16 no.4
    • /
    • pp.59-64
    • /
    • 2008
  • We need to use the strength of each Species Distribution Model(SDM) because presence location data were only collected due to time and economic limitations in Korea. This study investigated and compared GAM(Generalized Additive Model) which is one of presence-absence models with Maxent(Maximum Entropy Model) which is one of presence only models according to location data(presence/absence data). The target species was Fisher(Martes pennanti) which is an endangered species in California, USA. We implemented environmental data such as topography, climate and vegetation, and applied models to sub-regions and study area. The results of this study were as follows. Firstly, GAM which used real presence and absence data was better than GAM which used pseudo-absence data and Maxent which used presence-only data. Secondly, Maxent was better than GAM when presence-only data were used. Lastly, each model which applied to different regions didn't predict other area well due to the difference of habitat environment and over-predicted outside of study area. We need to select an optimal model to predict a suitable habitat according to the type and distribution of location data.

  • PDF