DOI QR코드

DOI QR Code

FUZZY REGRESSION TOWARDS A GENERAL INSURANCE APPLICATION

  • Kim, Joseph H.T. (Department of Applied Statistics and Quantitative Risk Management (QRM), College of Business and Economics, Yonsei University) ;
  • Kim, Joocheol (Department of Economics and Quantitative Risk Management (QRM), College of Business and Economics, Yonsei University)
  • Received : 2013.10.04
  • Accepted : 2014.02.10
  • Published : 2014.05.30

Abstract

In many non-life insurance applications past data are given in a form known as the run-off triangle. Smoothing such data using parametric crisp regression models has long served as the basis of estimating future claim amounts and the reserves set aside to protect the insurer from future losses. In this article a fuzzy counterpart of the Hoerl curve, a well-known claim reserving regression model, is proposed to analyze the past claim data and to determine the reserves. The fuzzy Hoerl curve is more flexible and general than the one considered in the previous fuzzy literature in that it includes a categorical variable with multiple explanatory variables, which requires the development of the fuzzy analysis of covariance, or fuzzy ANCOVA. Using an actual insurance run-off claim data we show that the suggested fuzzy Hoerl curve based on the fuzzy ANCOVA gives reasonable claim reserves without stringent assumptions needed for the traditional regression approach in claim reserving.

Keywords

1. Introduction

In non-life insurance applications determining the evolution of the future claims is an important consideration for insurance companies. The estimated amount of future claims then forms a basis for the reserve which must be set aside to protect the insurer from future losses. In the current article we focus on non-life insurance contracts, such as the auto and medical insurance, and attempt to find the fair reserve using the fuzzy regression methodology.

The traditional claim reserving approach in the insurance literature typically takes the following sequential steps to determine the reserve amount to be held by the insurer for an insurance portfolio. First, the claim trend is estimated from the past claim data using some standard regression models. Second, assuming that the future claim pattern would emerge in a similar fashion as observed in the past, future claims are projected based on the estimated regression model. This step also allows that the stochastic characteristic of the future claims can be captured by the perturbation term of the regression error term. Finally, using the predicted claim amounts, the reserve of the portfolio is determined by the difference between the predicted ultimate future claim amount and the (known) current claim amount.

While this crisp approach can capture some stochastic aspects of future uncertainty, its adoption of the standard regression models is criticized on several bases. For example, the number of data to fit the regression model is typically of small size, which could lead to inadequate statistical analyses. Also, more importantly, the set of error assumptions required for regression analyses, such as the independence among the perturbation terms, is easily violated under the crisp approach. This may seriously distort the credibility of the predicted future claims as well as the degree of its uncertainty.

In light of these shortcomings, other alternatives and generalizations of insurance claim reserving methods have been proposed in the literature; see, e.g., [2] for a survey of various reserving schemes. Among these [8] offers an alternative reserving method based on a fuzzy theory. In its original paper, [8] utilizes the simple fuzzy linear regression of [3] on the link ratio1 on the log-transformed past data. The reserves obtained from this fuzzy regression is reported to perform well compared to the traditional crisp reserving approach, with less stringent assumptions on the error terms of the ordinary least square regression method. [8] also provides a concise survey for other insurance applications of the fuzzy theory.

Our contribution in this paper is twofold. First, we extend the fuzzy claim reserving method of [8] where the simple fuzzy regression is adopted ignoring the cohort (that is, calendar year) effect in the claim data. We employ a more general parametric called the Hoerl curve which accommodates the cohort effect as well as the development periods; see, e.g., [2], [11] and [4]. The use of the Hoerl curve, however, calls for a statistical analysis known as the analysis of covariance, or ANCOVA, a combination of the linear regression and the analysis of variance. Therefore our second contribution is this paper is a development of the fuzzy ANCOVA model.

The present article is organized as follows. In Section 2 some backgrounds on the fuzzy numbers and regression are presented. Section 3 explains how the crisp regression method is used for the traditional claim reserving in insurance applications. In particular, the Hoerl curve is introduced as a flexible parametric model, which is an ANCOVA model, a blending of the linear regression model and the analysis of variance (ANOVA). In Section 4, the fuzzy counterpart of the crisp Hoerl curve is proposed along with the fuzzy ANCOVA procedure. The resulting fuzzy reserves are also calculated for the working data. Throughout the paper we use an actual insurance data retrieved from [4] for numerical illustrations.

 

2. Fuzzy numbers and fuzzy regression

2.1. Fuzzy numbers.

A fuzzy number (FN) is a fuzzy subset ã defined over real numbers. Among different choices of FNs we focus on Triangular FNs (TFNs) for its practicality and mathematical tractability. A TFN is defined as ã = (a, la ra) where a is the center (or core) and the latter two stand for the left and right spreads, respectively. A characterization of such a FN can be made explicit via its membership function

or, alternatively, by its α-cuts:

2.2. Fuzzy regression.

Consider an n-variate crisp function y = ƒ(θ1, ..., θn). In the regression context θi is the ith regression coefficient. If θ1, ..., θn are not crisp numbers but FNs ã1, ..., ãn we have

If we restrict ƒ(·) to be linear so that where symbol xi has been deliberately chosen to relate to regression models, the resulting is again a TFN with the three elements given by

and

In fact, for the linear functional case, we can not only obtain , but the closed expression for the α-cuts of as well. Let us suppose without loss of generality that ƒ is increasing in the first m ≤ n variables (i.e., θ1, ..., θm) and decreasing in the remaining variables (i.e., θm+1, ..., θn), α-cuts are then simply

Now we describe the fuzzy regression (FR) of [3], an extension of [10]. The FR to be introduced here is a natural applications of the linear function result explained above. Consider a sample of size n with m explanatory variables. The FR is then stated as

where the coefficients are fuzzy and the explanatory variables are crisp. Assuming the TFN for all FNs, this has a nice solution for Ỹj = (Yj, lYj, rYj) as before:

To estimate FNs ã0,ã1,...,ãm we take the following two steps:

Here the lower and upper values of the response, are set to be the minimum and maximum possible value of Yj, respectively. If there are repeated observations at level j one may simply take to be the maximum (or minimum) among the observations at level j. Otherwise, these are appropriately selected by the experimenter. Hence we can obtain using the optimization algorithm above at any given level α∗ ∈ [0, 1]. We will use this algorithm to fit the actual data later.

When ƒ(θ1, ..., θn) is a non-linear function, its FN version (2) is not suitable for TFN case as the left side is not in general a TFN even though ã1,...,ãn are. One suggestion to resolve this conflict is to use the first-order Talyor expansion to approximate see [1]. Assume again that ƒ(θ1, ..., θn) is increasing in the first m ≤ n variables and decreasing in the remaining variables.

Then the linear (or the first-order Talyor) approximation is given by

For the linear programming in the fuzzy contexts, refer to, for example, [5] and [6].

 

3. Claim reserving method: Traditional crisp approach

In this section we introduce the concept to insurance claim reserving using the traditional crisp approach that is based on the regression analysis.

3.1. Background.

For general insurance business, contracts, insurance contracts may have a long period to settle claims due to, e.g., legal processes. For example, consider the accidents occurred in 2013. For these accidents, the insurer makes claim payments not just in 2013, but also in the subsequent years 2014, 2015, and so on. Hence the insurance premium collected in 2013 must be large enough to cover the claims arisen from multi-years from 2013. In our example, the claim payment made in each year (2013, 2014, ...) is called the incremental loss belonging to accident year 2013, and each year after 2013 is termed the development year. Because of this time lag effect, one would expect the incremental loss eventually decreases over time (development years), converging to zero.

Clearly, the accident year, which refers to the origin time of a given loss, is different from the calendar year, the usual year we use everyday. It is a standard practice that the total claim payments made in any calendar year are split and attributed to each accident year. For example, the total claim payments made in calendar year 2013 covers not just the claims occurred in the current accident year 2013, but also claims belonging to past accident years 2012, 2011, 2010, and so on. Understanding how much proportion of the current calendar year payment belongs to each accident year is important for the insurer as it provides the basis of the premium for insurance contracts as aforementioned.

Due to this complication, the historical insurance claim data is typically presented in a so-called run-off triangle looking like Table 1, which is an actual data retrieved from [4] (Chapter 10). Each row represents the accident year and each columns stands for the development year (period). In the table, Ci,j is the incremental loss payment made in development year j originated from accident year i. Consequently i+j = n is the calendar year where Ci,j is made, and thus stands for the total payments of the insurer in calendar year n. For notational simplicity, it is customary for i to take values 1, ..., n so that i = 1 corresponds to the initial accident year reported in data, and i = n to the latest calendar year observed. The empty elements in the table are future loss values to be predicted. The main task of run-off triangle analysis insurance claim reserving is to fill the table with a suitably predicted numbers from a model.

TABLE 1.Run-off table of incremental loss Ci,j data

TABLE 2.Run-off table of cumulative loss Zi,j data

In addition to the incremental loss Ci,j, the standard claim reserving requires several related quantities:

3.2. Finding past claim trend using ANCOVA.

3.2.1. Modeling claim trend with regression.

In order to predict the future claim we first need to find the past claim trend. Traditionally insurers used the past values of Ci,j, Zi,j or ri,j to model the past claim trend. Various parametric models have been suggested in the literature. Some use ri,j values to model the past claim trend, while others use Ci,j or Zi,j; see, e.g., [2] for a survey. In [8] the log-transformed link ratio is regressed on the development year, ignoring the calendar year effect, to yield a simple linear FR, the idea motivated from the crisp approach of [9]. That is, the regression equation

is analyzed using the OLS.

In the present article, we consider a more general parametric model called the Hoerl curve, as discussed in [2], [11] and [4] (Chapter 10). In the Hoerl curve the incremental loss, rather than the link ratio, is modeled directly:

After taking logarithm on both sides, one arrives at the following linear equation

If perturbation terms are added, (14) can be analyzed in the linear regression framework with suitably estimated parameters for the mean responses. From the regression perspective, the Hoerl curve after log-transform (14) has several advantages over the Sherman’s model (12) considered in [8]. First, the Hoerl curve is more flexible as it allows two explanatory variables, leading to a multiple linear regression rather than the simple one in [8]. The power of explaining the response variable therefore should be better in general. Second, unlike the Sherman’s model, the Hoerl curve takes the calendar year (or cohort) effect into account. This is a desirable aspect of any claim reserving model as the same cohort could have a common characteristic shared over time, such as the inflation factor. Overall, the Hoerl curve provides a more realistic parametric model with additional parameters over the Sherman’s model. The additional parameters however change the structure of the regression model, leading to a model commonly known as the analysis of covariance (ANCOVA) in the statistical literature.

3.2.2. ANCOVA.

In the statistical literature, the ANCOVA essentially blends the linear regression model and the analysis of variance (ANOVA). The model considered in ANCOVA is typically a multiple regression analysis in which there is at least one quantitative and one categorical explanatory variable. Usually the discrete categorical variable stands for different groups or factors (e.g., different types of treatments, genders), and the quantitative variables are control variables which are included to improve power. Restricting on the case where there is one categorical variable, and no interaction between the categorical variable and other m quantitative variables, the regression equation looks like:

where μ+Ti stands for the effect of the ith group or treatment in the categorial variable, and Xk is the kth quantitative explanatory variable. The model in (15) thus relates the response variable with both categorical and quantitative variables. Note however that the intercept is the only coefficient that varies over different groups; other coefficients are common for all the groups. This indicates that after neutralizing the group effect by adjusting the intercept terms, all the data share the same regression model. An alternative modeling approach in the presence of the categorial variable is to simply treat it as another explanatory variable, in which case the standard regression can be carried out with no additional difficulties. It is noted however that such an approach is different from the model stated in (15) as, in this case, the coefficient of the categorical variable extracts the linear trend in the the categorial variable assuming its continuity. Hence it cannot capture the qualitative distinction among different groups. See standard statistics texts, e.g., [7], for further details on ANCOVA analyses. In light of this, we see that the regression equation (14) is a special case of the model stated in (15), and thus suitable for ANCOVA analysis. In particular, in (14), index i or the whole intercept term c + αi explains the effect of different calendar year in the run-off triangle. Other variabilities of the data is explained by the development year j and its function log j. So, after adjusting the preexisting differences in calendar year effect, all claims should evolve in the same fashion as a function of the development year. Using the standard statistical package, we obtain the parameters for the working data as

The ANCOVA is a widely used analysis method in statistics but its fuzzy counterpart is rarely studied in the literature. In the next section we propose a fuzzy ANCOVA method based on the FR of [3], and use it to predict the future claims and estimate the fair reserves for an insurance portfolio. In the passing, we provide the estimated parameters of the alternative model where the categorical variable is treated as a explanatory variable. The estimated parameters are and the coefficient for the accident year is 0.0982, which is positive and always leads to a higher log incremental loss for higher calendar year, contradicting the intercept pattern in (16) where some later calendar years have smaller intercepts. In the sequel, we focus on the ANCOVA model only.

3.3. Predicting future claim evolution.

The estimated regression models in (14) basically smooths the past claim trend. In the insurance claim reserving it is the convention to assume that the future claim pattern would emerge in a similar fashion as observed in the past, and thus future claims are projected based on the estimated regression model. For this, one first needs to back-transform to recover the original quantities. The link ratio is then, from (10) and the Hoerl curve (13),

which simplifies things as this is independent of i. Hence, the future projection rate (11) takes a simple form without index i (so the superscript is omitted):

The ultimate future cumulative loss Zi,n, defined as the last column of the runoff table of Zi,j, is then estimated as the product of the most recent cumulative loss and future projection factor for the future time horizon:

Finally, the reserve for accident year i, determined in calendar year n, is defined as the difference between the ultimate future cumulative loss and the current cumulative loss

The reserve is understood most easily using Table 2. Here Ri is the difference between the last column value of row i after the table has been fully filled and the latest value of the triangle in row i. The total reserve then is simply the sum of the reserves of all accident years

For the numerical data, the crisp reserve results are provided in the top panel of Table 3, which will be further discussed in Section 4.

 

4. Claim reserving method using fuzzy regression

4.1. Motivation.

We have shown how the classical (crisp) claim reserving approach uses a linear regression model to smooth past claim data as in (14) and predict the projection rates. However the assumptions underlying the standard regression model are criticized on several grounds as briefly mentioned in Introduction. Specifically, the error terms added in (14) are neither independent nor identically distributed. This is because the past claims are unlikely to be uncorrelated. Also, when the data size is small, as is often the case in insurance applications, statistical analyses may not give meaning ful conclusions. In this section, we now look at the claim reserving with fuzzy regression method as an alternative solution to overcome the difficulties of the standard approach.

4.2. Fuzzy ANCOVA.

Our fuzzy ANCOVA adapts the fuzzy regression method proposed by [3], of which procedure is described in Section 2.2. As before we assume that the regression coefficients are fuzzy while the explanatory variables are crisp. To begin with, we may consider the fuzzy counterpart of (15):

which is different from (5), and clearly the FR approach in Section2 is not directly applicable. To tackle this problem, we first recall that, in the crisp ANCOVA model (15), the data should share the same regression model after an adjustment for preexisting differences in nonequivalent groups. Another way to look at this is to rearrange (22) to get

The left side is then the response variable net of the group or treatment effect, making the right side no longer affected by index i. Consequently, we put the left side omitting i, and can further express (23) as

which is equivalent to the FR in (5) without the intercept term. If there are w different groups (treatments) and ni observations for i = 1, ..., w, the index j runs over As the solution of the FR in (24) can be readily available from Section 2, the remaining task is to determine the intercept for each i, so that the final FN response variable can be obtained from

Essentially the challenge lies in estimating the intercept FN separately, in the presence of the other FN coefficients in the FR model (5). In theory the intercept FN can vary in their center values as well as the spreads over different i. If we assume however that the categorical variable is a description of a qualitative type or classes, as is the case for most applications, and thus cannot be fuzzy by nature, we could argue that both be crisp numbers obtained from the OLS in ANCOVA. We believe this solution is consistent with the spirit of the ANCOVA because term is the only source representing the categorial variable in the model (22), and this source should not be fuzzy by nature as it categorizes, e.g., different genders (groups), different types of treatments, or different calendar years in our case.

To summarize, the estimation procedure of the fuzzy ANCOVA analysis for (22) is as follows:

4.3. Finding past claim trend using fuzzy ANCOVA.

We apply the general procedures for the FR developed in the previous subsection for the working data with the fuzzy regression equation of the log Hoerl curve

where The result of Step 1 has already been done using the crisp OLS with the estimates given in (16). For Step 2, we set and

In Step 3, for the upper and lower limit of Yj, we naturally set

and

where i = 1, ..., 8. At α∗ = 0.3, we obtain the two TFN parameters:

and

From (18) the fuzzy projection rate is given by

We note that ƒj,s is a non-linear function of β and γ(the intercept term c + αi disappears after canceling out in the ratio), and that the resulting FN is not a TFN due to nonlinearity, warranting a linear approximation. In order to use the first-order Taylor expansion, as described in Section 2, we obtain the partial derivatives of ƒj,s in (18) as

and

In addition, one can also show that, for j < s, both of these partial derivatives are positive. To prove this, we denote g(k) = exp(βlog k + γk), which is always positive, for notational convenience. Then, from (30),

The last inequality holds because log i - log k > 0 for all values for k = 1, ..., j and i = j + 1, ..., s.

Similarly, for (31),

Now keeping in mind the signs of the partial derivatives, the approximated TFN, denoted is given by, from (8),

Using the estimated values for each j and s, one can readily calculate all the future projection rates that can help fill the run-off table.

4.4. Fuzzy claim reserve.

Recall that in the crisp approach, the future cumulative loss Zi,s, s > n - i + 1, was given by

In the fuzzy approach, therefore, we would use the linear approximation of the FN

which is a TFN. Therefore the fuzzy reserve for accident year i is given by

and the total reserve by

For our run-off cumulative claim data (2), we present the fuzzy values (34) of in Table 3, and the reserves in Table 4. To look at the trend from the past we also included the past Zij values in the top panel of Table 3.

TABLE 3.Predicted fuzzy value of cumulative loss Zij : Center, left and right spreads

TABLE 4.Fuzzy reserves for each accident year

From Table 4 the total fuzzy reserve is given by

meaning that the claim reserve for this portfolio is approximately 659.79 but there may be deviation below (above) no greater than 473.79 (618.5). We can draw similar conclusion for other choices of α∗ with smaller values leading to smaller spreads for both cumulative claims and the reserves.

 

5. Concluding remarks

In non-life insurance applications determining the evolution of the future claims is an important task to calculate the reserve which must be set aside to protect the insurer from future losses. The traditional claim reserving approach in the insurance literature typically uses a parametric regression model to estimte the future claim amounts and thus obtain the reserve. However this crisp approach is criticized on statistical bases including the violation of the error assumptions required for regression analyses, and the fuzzy regression method can serve as an alternative solution. In the present article we extend the fuzzy claim reserving method of [8] where the simple fuzzy regression is adopted ignoring the cohort effect in the claim data. We develop a fuzzy counter part of the well-known Hoerl curve which accommodates the cohort effect as well as the development periods. This task, however, also calls for a fuzzy counterpart of the analysis of covariance, or ANCOVA, a combination of the linear regression and the analysis of variance. Our proposed fuzzy ANCOVA is simple to use and consistent with the statistical ANCOVA. Using an actual insurance claim data we find the fuzzy Hoerl curve adequately calculates relevant claim reserving quantities.

References

  1. D. Dubois and H. Prade, Analysis of Fuzz Information, Vol. 2, chapter Fuzzy numbers: an overview. CRC-Press, Boca Raton (1988).
  2. P. England and R. Verrall, Stochastic claims reserving in general insurance. British Actuarial Journal, Vol. 8 (2002), No. 3, 443-518. https://doi.org/10.1017/S1357321700003809
  3. H. Ishibuchi and M. Nii, Fuzzy regression using asymmetric fuzzy coefficients and fuzzified neural networks. Fuzzy Sets and Systems, Vol. 119 (2001), No. 2, 273-290. https://doi.org/10.1016/S0165-0114(98)00370-4
  4. R. Kaas, Modern actuarial risk theory: Using R, Springer, New York, 2008.
  5. A. Kumar and A. Kaur, Application of linear programming for solving fuzzy transportation problems, Journal of Applied Mathematics and Informatics, Vol. 29 (2011), No. 3-4, 831-846. https://doi.org/10.14317/JAMI.2011.29.3_4.831
  6. H. Maleki and M. Mashinchi, Fuzzy number linear programming: A probabilistic approach (3), Journal of Applied Mathematics and Informatics, Vol. 15 (2004), No. 1-2, 333-341.
  7. J. Neter, M. Kutner, C. Nachtsheim, and W. Wasserman, Applied linear statistical models. Irwin, Chicago, 4th edition, 1996.
  8. J. Sanchez, Calculating insurance claim reserves with fuzzy regression. Fuzzy sets and systems, Vol. 157 (2006), No. 23, 3091-3108. https://doi.org/10.1016/j.fss.2006.07.003
  9. R. Sherman, Extrapolating, smoothing, and interpolating development factors. In Proceedings of the Casualty Actuarial Society, Vol. 71 (1984), 122-155.
  10. H. Tanaka, S. Uejima, and K. Asai, Linear regression analysis with fuzzy model. IEEE Trans. Systems Man Cybern, Vol. 12 (1982), 903-907. https://doi.org/10.1109/TSMC.1982.4308925
  11. T. Wright, A stochastic method for claims reserving in general insurance. Journal of the Institute of Actuaries Vol. 117 (1990), 677-731. https://doi.org/10.1017/S0020268100043262