DOI QR코드

DOI QR Code

Predicting claim size in the auto insurance with relative error: a panel data approach

상대오차예측을 이용한 자동차 보험의 손해액 예측: 패널자료를 이용한 연구

  • Park, Heungsun (Department of Statistics, Hankuk University of Foreign Studies)
  • 박흥선 (한국외국어대학교 통계학과)
  • Received : 2021.06.09
  • Accepted : 2021.07.02
  • Published : 2021.10.31

Abstract

Relative error prediction is preferred over ordinary prediction methods when relative/percentile errors are regarded as important, especially in econometrics, software engineering and government official statistics. The relative error prediction techniques have been developed in linear/nonlinear regression, nonparametric regression using kernel regression smoother, and stationary time series models. However, random effect models have not been used in relative error prediction. The purpose of this article is to extend relative error prediction to some of generalized linear mixed model (GLMM) with panel data, which is the random effect models based on gamma, lognormal, or inverse gaussian distribution. For better understanding, the real auto insurance data is used to predict the claim size, and the best predictor and the best relative error predictor are comparatively illustrated.

상대오차를 이용한 예측법은 상대오차(혹은 퍼센트오차)가 중요시되는 분야, 특히 계량경제학이나 소프트웨어 엔지니어링, 또는 정부기관 공식통계 부분에서 기존 예측방법 외에 선호되는 예측방법이다. 그 동안 상대오차를 이용한 예측법은 선형 혹은 비선형 회귀분석 뿐 아니라, 커널회귀를 이용한 비모수 회귀모형, 그리고 정상시계열분석에 이르기까지 그 범위가 확장되어 왔다. 그러나, 지금까지의 분석은 고정효과(fixed effect)만을 고려한 것이어서 임의효과(random effect)에 관한 상대오차 예측법에 대한 확장이 필요하였다. 본 논문의 목적은 상대오차예측법을 일반화선형혼합모형(GLMM)에 속한 감마회귀(gamma regression), 로그정규회귀(lognormal regression), 그리고 역가우스회귀(inverse gaussian regression)의 패널자료(panel data)에 적용시키는데 있다. 이를 위해 실제 자동차 보험회사의 손해액 자료를 사용하였고, 최량예측량과 최량상대오차예측량을 각각 적용-비교해 보았다.

Keywords

Acknowledgement

이 연구는 2020 학년도 한국외국어대학교 교내학술연구비의 지원에 의하여 이루어진 것임.

References

  1. Bickel PJ and Doksum KA (1977). Mathematical Statistics, Holden-Day Inc., Oakland, CA.
  2. Boland PJ (2007). Statistical and Probabilistic Methods in Actuarial Science, Boca Raton: Chapman & Hall/CRC.
  3. Boodhun N and Jayabalan M (2018). Risk prediction in life insurance industry using supervised learning algorithm, Complex & Intelligent Systems, 4, 145-154. https://doi.org/10.1007/s40747-018-0072-1
  4. Breslow NE and Clayton DG (1993). Approximate inference in generalized linear mixed models, Journal of the American Statistical Association, 88, 9-25. https://doi.org/10.2307/2290687
  5. Chen K, Guo S, Lin Y, and Ying Z (2010). Least absolute relative error estimation, Journal of the American Statistical Association, 105, 1104-1112. https://doi.org/10.1198/jasa.2010.tm09307
  6. Chen K, Lin Y, Wang Z, and Ying Z (2016). Least product relative error estimation, Journal of Multivariate Analysis, 144, 91-98. https://doi.org/10.1016/j.jmva.2015.10.017
  7. Chhikara RS and Folks JL (1989). The Inverse Gaussian Distribuition, Marcel Dekker, New York.
  8. Davidian M and Giltinan DM (1995).Nonlinear Models for Repeated Measurement Data, Boca Raton: Chapman & Hall/CRC.
  9. Frees E (2018). Loss Data Analytics, an open text authored by the Actuarial Community.
  10. Golub GH and Welsch JH (1969). Calculation of gaussian auadrature rules, Mathematical Computing, 23, 221-230. https://doi.org/10.1090/S0025-5718-69-99647-1
  11. Hong L and Martin R (2019). Valid Model-Free Prediction of Future Insurance Claims, Retrieved October 12, 2019, from SSRN: https://ssrn.com/abstract=3468969 or http://dx.doi.org/10.2139/ssrn.3468969
  12. Huang T, Zhao R, and Tang W (2009). Risk model with fuzzy random individual claim amount, European Journal of Operational Research, 192, 879-890. https://doi.org/10.1016/j.ejor.2007.10.035
  13. Johnson NL and Kotz S (1970). Continuous Univariate Distributions: Distributions in Statistics, John Wiley & Sons, New York.
  14. Jones MC, Park H, Shin KI, Vines SK, and Jeong SO (2008). Relative error prediction via kernel regression smoothers, Journal of Statistical Planning and Inference, 138, 2887-2898. https://doi.org/10.1016/j.jspi.2007.11.001
  15. Jong Piet de and Heller GZ (2008). Generalized Linear Models for Insurance Data, international series on actuarial science, Cambridge University Press.
  16. Jorgensen B and Souza MCP de (1994). Fitting Tweedie's compound poisson model to insurance claims data, Scandinavian Actuarial Journal, 1, 69-93. https://doi.org/10.1080/03461238.1994.10413930
  17. Kahneman D and Tversky A (1979). Prospect theory: an analysis of decision under risk, Econometrica, 47, 263-291. https://doi.org/10.2307/1914185
  18. Kim MJ and Kim YH (2009). Various modeling approaches in auto insurance pricing, Journal of the Korean Data & Information Science Society, 20, 515-526.
  19. Lee Y and Nelder JA (1996). Hierarchical generalized linear models, Journal of Royal Statistical Society: Series B (Methodological), 58, 619-656. https://doi.org/10.1111/j.2517-6161.1996.tb02105.x
  20. Lee Y and Nelder JA (2001). Hierarchical generalized linear models: a synthesis of generalized linear models, random-effect models and structured dispersions, Biometrika, 88, 987-1006. https://doi.org/10.1093/biomet/88.4.987
  21. Makridakis SG (1984). The Forecasting Accuracy of Major Time Series Methods, Wiley, New York.
  22. Park H and Shin KI (2005). A shrinked forecast in stationary processes favoring percentage error, Journal of Time Series Analysis, 27, 129-139. https://doi.org/10.1111/j.1467-9892.2005.00458.x
  23. Park H and Stefanski LA (1998). Relative-error prediction, Statistics & Probability Letters, 40, 227-236. https://doi.org/10.1016/S0167-7152(98)00088-1
  24. SAS/STAT User's Guide (2012). The GLIMMIX Procedure, SAS Institute Inc., Cary, NC, USA.
  25. Smyth GK and Jorgensen B (2002). Fitting tweedie's compound poisson model to insurance claims data: dispersion modeling, Actuarial Studies in Non-life insurance (ASTIN) bulletin, 32, 143-157.
  26. Stoyanov J (1999). Inverse Gaussian distribution and the moment problem, Journal of Applied Statistical Science, 9, 61-71.
  27. Tweedie MCK (1984). An index which distinguishes between some important exponential families in statistics applications and new directions. In Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, 579-604.
  28. Wang Z, Chen Z, and Chen Z (2018). H-relative error estimation approach for multiplicative regression model with random effect, Computational Statistics, 33, 623-638. https://doi.org/10.1007/s00180-018-0798-7
  29. Werner G and Modlin C (2009). Basic Ratemaking Workshop, The Casualty Actuarial Society.
  30. Wolfinger RD and O'Connell M (1993). Generalized linear mixed models: a pseudo-likelihood approach, Journal of Statistical Computation and Simulation, 40, 233-243. https://doi.org/10.1080/00949659208811379
  31. Wuthrich MV and Merz M (2008). Stochastic Claims Reserving Methods in Insurance, West Sussex: John Wiley & Sons, England.