class: center, middle, inverse, title-slide # Loss modelling and reserving analytics in R ## A hands-on workshop
### Katrien Antonio & Jonas Crevecoeur ###
IA|BE workshop
| June 3, 10 & 17, 2021 --- # Today's Outline .pull-left[ * [Motivation and strategies](#motivation) * [Reserving data structures](#vertical_aggregation) - individual and aggregated reserving data - runoff triangles * [Claims reserving with triangles](#chainladder) - chain ladder method - {ChainLadder} package - chain ladder implementation with GLMs ] .pull-right[ * [When the chain ladder method fails](#chainladderfail) - detection tools for triangle stability - creating homogeneous triangles * [Research outlook](#research_outlook) - granular reserving methods for predicting the IBNR reserve - individual reserving methods for predicting the RBNS reserve ] --- class: inverse, center, middle name: motivation # Motivation and strategies <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Motivation <br> Insurers observe the .hi-pink[detailed evolution] of .hi-pink[individual claims] over time: <br> <img src="image/development_claim.png" width="60%" style="display: block; margin: auto;" /> --- # Motivation (continued) <img src="image/lexis_claim.png" width="45%" style="display: block; margin: auto;" /> Observed claims are .hi-pink[censored] due to .hi-pink[delays] (reporting, settlement) in the claim development process. .KULbginline[Reserve]: future costs for claims that occurred in .hi-pink[past exposure periods] (claims B and C) * RBNS: Reported, But Not yet Settled claims (claim B) * IBNR: Incurred, But Not yet Reported claims (claim C). .KULbginline[Pricing]: all costs for claims that will occur in .hi-pink[future] insured .hi-pink[exposure periods] (claim D). --- # Three strategies for non-life reserving <img src="image/reserving_methods.png" width="67%" style="display: block; margin: auto;" /> --- # Three strategies for non-life reserving (continued) <br> ### Aggregate reserving First, we aggregate the past claim history into a .hi-pink[small number of summary statistics], <br> then we predict the total reserve based on these summary statistics. ### Granular reserving First, we aggregate the past claim history into a .hi-pink[large number of summary statistics], <br> then we predict the total reserve based on these summary statistics. ### Individual reserving First, we predict the future costs for .hi-pink[individual claims], <br> then we aggregate these individual predictions to predict the total reserve. --- # Data set used in this session We illustrate reserving analytics based on a .hi-pink[simulated dataset] registering the .hi-pink[detailed development] of 30,000 claims occurring between January 1, 2010 and December 31, 2020. Data is available in a daily and yearly format. .KULbginline[Daily] data: * with claim events recorded at daily precision * with records correspond to the reporting of a claim or a payment being made. The data is stored in a .rds file in the `data` directory in the course material: ```r # install.packages("rstudioapi") dir <- dirname(rstudioapi::getActiveDocumentContext()$path) setwd(dir) reserving_daily <- readRDS("data/reserving_data_daily.rds") ``` .tiny[ ``` ## accident_number accident_date reporting_date settlement_date reporting_delay settlement_delay payment_date payment_size ## 1 1 2012-12-02 2012-12-07 2013-03-10 5 98 2012-12-07 0.0000 ## 2 1 2012-12-02 2012-12-07 2013-03-10 5 98 2012-12-13 446.2251 ## 3 1 2012-12-02 2012-12-07 2013-03-10 5 98 2012-12-24 490.1792 ## 4 1 2012-12-02 2012-12-07 2013-03-10 5 98 2013-01-06 587.7409 ## 5 1 2012-12-02 2012-12-07 2013-03-10 5 98 2013-01-24 1726.0902 ## 6 2 2014-02-04 2014-02-10 2014-08-21 6 198 2014-02-10 0.0000 ``` ] --- # Data set used in this session We illustrate reserving analytics based on a .hi-pink[simulated dataset] registering the .hi-pink[detailed development] of 30,000 claims occurring between January 1, 2010 and December 31, 2020. Data is available in a daily and yearly format. .KULbginline[Yearly] data with one record per claim per development year: * for each development year the data registers: * close is 1 when the claim closes in the current development year, 0 otherwise * payment is 1 when there is at least one payment, 0 otherwise * size: total amount paid for the claim in the current development year The data is stored in a .rds file in the `data` directory in the course material: ```r reserving_yearly <- readRDS("data/reserving_data_yearly.rds") ``` .tiny[ ``` ## # A tibble: 6 x 10 ## accident_number accident_year reporting_year settlement_year development_year calendar_year size payment close is_closed ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 2 2 3 1 2 936. 1 0 0 ## 2 1 2 2 3 2 3 2314. 1 1 1 ## 3 2 4 4 4 1 4 3537. 1 1 1 ## 4 3 6 6 6 1 6 1975. 1 1 1 ## 5 4 9 10 10 1 9 0 0 0 0 ## 6 4 9 10 10 2 10 3107. 1 1 1 ``` ] --- name: yourturn class: clear .left-column[ ## <i class="fa fa-edit"></i> <br> Your turn ] .right-column[ <br> In this warm up exercise you load the `reserving_daily` or `reserving_yearly` data and get some feel for the data. Choose for each task the format that is most convenient (daily or yearly). 1. Visualize the reporting and settlement delay of claims with a density plot. 2. When was the last payment registered in the data set? <br>What do you conclude? 3. What is the average number of payments per claim? 4. Calculate the number of claims per accident year. ] --- class: clear .pull-left[ For .hi-pink[Q.1] we plot the density of reporting and settlement delay: ```r ggplot(data = reserving_daily) + theme_bw() + geom_density(aes(reporting_delay, fill = 'reporting delay'), alpha = .5) + geom_density(aes(settlement_delay, fill = 'settlement_delay'), alpha = .5) + xlab('delay in days') + xlim(c(0, 1000)) ``` <!-- --> ] .pull-right[ For .hi-pink[Q.2] the last payment was in 2025, i.e. the simulated data is not yet censored! ```r max(reserving_daily$payment_date) ``` ``` ## [1] "2025-05-03" ``` ] --- class: clear .pull-left[ For .hi-pink[Q.3] we average the number of payments per claim. Count only records with `payment_size > 0` to exclude the reporting event. ```r reserving_daily %>% group_by(accident_number) %>% summarise(payments = sum(payment_size > 0)) %>% ungroup() %>% summarise(average = mean(payments)) ``` ``` ## # A tibble: 1 x 1 ## average ## <dbl> ## 1 2.38 ``` ] .pull-right[ For .hi-pink[Q.4] we group the data by accident year and summarise the number of claims. ```r library(tidyverse) reserving_yearly %>% filter(development_year == 1) %>% group_by(accident_year) %>% summarise(num_claims = n()) ``` ``` ## # A tibble: 11 x 2 ## accident_year num_claims ## * <dbl> <int> ## 1 0 2743 ## 2 1 2725 ## 3 2 2810 ## 4 3 2756 ## 5 4 2705 ## 6 5 2706 ## 7 6 2703 ## 8 7 2708 ## 9 8 2640 ## 10 9 2751 ## 11 10 2391 ``` We apply the filter `development_year == 1` to count each claim only once. Claim frequency is slightly lower in accident year 10. ] --- # Censoring the data <br> The simulated data is not censored, i.e. we observe the full development of all claims that occur before the end of 2020.<br> In practice the available data set will be censored and we will only observe: * reported claims * development information in past calendar years. We censor the data assuming that we observe data until the end of 2020: ```r observed_daily <- reserving_daily %>% filter(payment_date <= as.Date('2020-12-31')) unobserved_daily <- reserving_daily %>% filter(payment_date > as.Date('2020-12-31')) observed_yearly <- reserving_yearly %>% filter(calendar_year <= 10, reporting_year <= 10) unobserved_yearly <- reserving_yearly %>% filter(calendar_year > 10 | reporting_year > 10) ``` --- # IBNR and RBNS reserves Reserving models predict the total payments in the unobserved part of the data: ```r reserve_actual <- sum(unobserved_yearly$size) reserve_actual ``` ``` ## [1] 2149312 ``` Some reserving methods split this reserve into * IBNR: a reserve for Incurred, But Not (yet) Reported claims * RBNS: a reserve for Reported, But Not (yet) Settled claims. ```r unobserved_yearly %>% mutate(reported = (reporting_year <= 10)) %>% group_by(reported) %>% summarise(reserve = sum(size)) ``` ``` ## # A tibble: 2 x 2 ## reported reserve ## * <lgl> <dbl> ## 1 FALSE 134124. ## 2 TRUE 2015188. ``` --- class: inverse, center, middle name: vertical_aggregation # Reserving data structures: individual and aggregated data <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # From individual to aggregated data ### Individual triangle The data set `reserving_yearly` describes the development of .hi-pink[individual claims discretised by calendar year]. <br> We can represent the claim characteristics in this data set (settlement, payment, payment size) with .hi-pink[two-dimensional tables] in which the rows represent .hi-pink[individual claims] and columns represent .hi-pink[development years]. <br> For payment size, this table consists of cells `\(U^\texttt{size}_{k, j}\)`, denoting the total amount paid for claim `\(k\)` in development year `\(j\)`. <br> ### Runoff triangle Many reserving models go one step further and remove individual claim characteristics by aggregating claims by occurrence year. The data is then represented in a (small) two-dimensional table, the so-called .hi-pink[incremental runoff triangle]. `$$X^{\texttt{size}}_{i, j} = \sum_{\texttt{occ.year(k)} = i} U^\texttt{size}_{k, j}.$$` As a result of the Law of Large Numbers (LLN), aggregating data from many claims into runoff triangles .hi-pink[reduces the variance] when all claims are independent realizations from the same distribution. --- # From individual to aggregated data (continued) .pull-left[ We construct a runoff triangle by aggregating the individual claims by `accident_year`. ```r observed_yearly %>% group_by(accident_year, development_year) %>% summarise(value = sum(size)) %>% pivot_wider(values_from = value, names_from = development_year, names_prefix = 'DY.') ``` `pivot_wider` creates for each unique value of `names_from` a new column ] .pull-right[ .tiny[ ``` ## # A tibble: 11 x 10 ## # Groups: accident_year [11] ## accident_year DY.1 DY.2 DY.3 DY.4 DY.5 DY.6 DY.7 DY.8 DY.9 ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0 4075901. 1610076. 112557. 24668. 10680. 8840. 9479. 6217. 3468. ## 2 1 4037070. 1461939. 62622. 18518. 0 NA NA NA NA ## 3 2 4322579. 1774121. 105850. 41429. 24099. 7031. 0 NA NA ## 4 3 4162489. 1616297. 104487. 15982. NA NA NA NA NA ## 5 4 3903230. 1605598. 160906. 24194. 11622. NA NA NA NA ## 6 5 4052230. 1623940. 131721. 26686. 14096. 6262. NA NA NA ## 7 6 4087125. 1593968. 96484. 40310. 4930. NA NA NA NA ## 8 7 4059058. 1771141. 170375. 37317. NA NA NA NA NA ## 9 8 4033678. 1309634. 140624. NA NA NA NA NA NA ## 10 9 4141278. 1615333. NA NA NA NA NA NA NA ## 11 10 3443223. NA NA NA NA NA NA NA NA ``` ] ] --- # From individual to aggregated data (continued) .pull-left[ A more sophisticated function for computing incremental runoff triangles is available in the R code. ```r incremental_triangle(observed_yearly, variable = 'payment', lower_na = TRUE) ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 2054 523 17 6 2 1 1 1 1 0 0 ## [2,] 2074 482 16 4 0 0 0 0 0 0 NA ## [3,] 2106 549 25 5 4 2 0 0 0 NA NA ## [4,] 2033 532 24 5 0 0 0 0 NA NA NA ## [5,] 2008 473 32 7 2 0 0 NA NA NA NA ## [6,] 2035 498 26 4 2 2 NA NA NA NA NA ## [7,] 2028 487 17 5 1 NA NA NA NA NA NA ## [8,] 2030 528 34 9 NA NA NA NA NA NA NA ## [9,] 2005 444 24 NA NA NA NA NA NA NA NA ## [10,] 2100 507 NA NA NA NA NA NA NA NA NA ## [11,] 1784 NA NA NA NA NA NA NA NA NA NA ``` ] ] .pull-right[ Using this function you can easily inspect the reserving data with multiple triangles. ```r incremental_triangle(observed_yearly, variable = 'size', lower_na = TRUE) ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 4075901 1610076 112556.58 24668.43 10679.546 8840.487 9478.77 6217.266 3468.364 0 0 ## [2,] 4037070 1461939 62622.23 18517.80 0.000 0.000 0.00 0.000 0.000 0 NA ## [3,] 4322579 1774121 105850.01 41429.20 24098.625 7030.788 0.00 0.000 0.000 NA NA ## [4,] 4162489 1616297 104486.62 15982.39 0.000 0.000 0.00 0.000 NA NA NA ## [5,] 3903230 1605598 160905.57 24194.32 11622.131 0.000 0.00 NA NA NA NA ## [6,] 4052230 1623940 131721.42 26685.98 14095.805 6261.739 NA NA NA NA NA ## [7,] 4087125 1593968 96483.79 40310.11 4930.009 NA NA NA NA NA NA ## [8,] 4059058 1771141 170374.72 37317.06 NA NA NA NA NA NA NA ## [9,] 4033678 1309634 140624.40 NA NA NA NA NA NA NA NA ## [10,] 4141278 1615333 NA NA NA NA NA NA NA NA NA ## [11,] 3443223 NA NA NA NA NA NA NA NA NA NA ``` ] ] --- # Cumulative runoff triangles .hi-pink[Cumulative runoff triangles] are constructed by computing the cumulative row sum of an .hi-pink[incremental runoff triangle], i.e. `$$C^{\texttt{size}}_{i, j} = \sum_{l = 1}^{j} X^{\texttt{size}}_{i, l}.$$` When using a new reserving method, check whether the incremental or cumulative runoff triangle is required! .pull-left[ ```r incremental_triangle(observed_yearly, variable = 'size', lower_na = TRUE) ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 4075901 1610076 112556.58 24668.43 10679.546 8840.487 9478.77 6217.266 3468.364 0 0 ## [2,] 4037070 1461939 62622.23 18517.80 0.000 0.000 0.00 0.000 0.000 0 NA ## [3,] 4322579 1774121 105850.01 41429.20 24098.625 7030.788 0.00 0.000 0.000 NA NA ## [4,] 4162489 1616297 104486.62 15982.39 0.000 0.000 0.00 0.000 NA NA NA ## [5,] 3903230 1605598 160905.57 24194.32 11622.131 0.000 0.00 NA NA NA NA ## [6,] 4052230 1623940 131721.42 26685.98 14095.805 6261.739 NA NA NA NA NA ## [7,] 4087125 1593968 96483.79 40310.11 4930.009 NA NA NA NA NA NA ## [8,] 4059058 1771141 170374.72 37317.06 NA NA NA NA NA NA NA ## [9,] 4033678 1309634 140624.40 NA NA NA NA NA NA NA NA ## [10,] 4141278 1615333 NA NA NA NA NA NA NA NA NA ## [11,] 3443223 NA NA NA NA NA NA NA NA NA NA ``` ] ] .pull-right[ ```r cumulative_triangle(observed_yearly, variable = 'size', lower_na = TRUE) ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 4075901 5685977 5798533 5823202 5833881 5842722 5852200 5858418 5861886 5861886 5861886 ## [2,] 4037070 5499009 5561632 5580150 5580150 5580150 5580150 5580150 5580150 5580150 NA ## [3,] 4322579 6096700 6202550 6243979 6268078 6275109 6275109 6275109 6275109 NA NA ## [4,] 4162489 5778787 5883273 5899256 5899256 5899256 5899256 5899256 NA NA NA ## [5,] 3903230 5508828 5669733 5693928 5705550 5705550 5705550 NA NA NA NA ## [6,] 4052230 5676170 5807891 5834577 5848673 5854935 NA NA NA NA NA ## [7,] 4087125 5681093 5777577 5817887 5822817 NA NA NA NA NA NA ## [8,] 4059058 5830199 6000574 6037891 NA NA NA NA NA NA NA ## [9,] 4033678 5343311 5483936 NA NA NA NA NA NA NA NA ## [10,] 4141278 5756610 NA NA NA NA NA NA NA NA NA ## [11,] 3443223 NA NA NA NA NA NA NA NA NA NA ``` ] ] --- class: inverse, center, middle name: chainladder # Claims reserving with triangles <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Mack chain ladder method <br> Assumptions underneath Mack's (1993) .hi-pink[chain ladder method]: 1. There exist development factors `\(f_j\)` such that `$$E(C_{i, j+1} \mid C_{i,1}, \ldots, C_{i, j}) = C_{i, j} \cdot f_j \quad \text{for} \quad 1 \leq i \leq l, 1 \leq j \leq l-1,$$` with `\(l\)` the dimension of the runoff triangle. 2. There exist parameters `\(\sigma_j\)` such that `$$Var(C_{i, j+1} \mid C_{i,1}, \ldots, C_{i, j}) = C_{i, j} \cdot \sigma^2_j \quad \text{for} \quad 1 \leq i \leq l, 1 \leq j \leq l-1.$$` 3. Occurrence years are independent. .KULbginline[Remarks]: * Method based on the .hi-pink[cumulative triangle]! * Assumption 1. and 2. add a .hi-pink[Markov property]. Future evolutions of the reserve depend only on the last known situation! --- # Mack chain ladder method (continued) <br> Estimating the .hi-pink[development factors] `\(\hat{f}_j\)`: `$$\hat{f}_{j} = \frac{\sum_{i=1}^{l-j} C_{i,j+1}}{\sum_{i=1}^{l-j} C_{i,j}}.$$` ```r triangle <- cumulative_triangle(observed_yearly, variable = 'size') l <- nrow(triangle) f <- rep(0, l-1) for(j in 1:(l-1)) { f[j] <- sum(triangle[1:(l-j), j+1]) / sum(triangle[1:(l-j), j]) } f ``` ``` ## [1] 1.391002 1.021245 1.004906 1.001600 1.000630 1.000323 1.000263 1.000196 1.000000 1.000000 ``` Development factors should converge to 1 for high development years. --- # Mack chain ladder method (continued) Use the development factors to estimate cells in the lower triangle ( `\(i + j > l + 1\)` ): `$$\hat{C}_{i, j} = \hat{C}_{i, j-1} \cdot \hat{f}_{j-1} \quad \text{for} \quad i + j > l + 1.$$` ```r triangle_completed <- triangle for(j in 2:l) { triangle_completed[l:(l-j+2), j] <- triangle_completed[l:(l-j+2), j-1] * f[j-1] } ``` .pull-left[ Completed cumulative triangle: ```r triangle_completed ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 4075901 5685977 5798533 5823202 5833881 5842722 5852200 5858418 5861886 5861886 5861886 ## [2,] 4037070 5499009 5561632 5580150 5580150 5580150 5580150 5580150 5580150 5580150 5580150 ## [3,] 4322579 6096700 6202550 6243979 6268078 6275109 6275109 6275109 6275109 6275109 6275109 ## [4,] 4162489 5778787 5883273 5899256 5899256 5899256 5899256 5899256 5900411 5900411 5900411 ## [5,] 3903230 5508828 5669733 5693928 5705550 5705550 5705550 5707052 5708170 5708170 5708170 ## [6,] 4052230 5676170 5807891 5834577 5848673 5854935 5856829 5858371 5859518 5859518 5859518 ## [7,] 4087125 5681093 5777577 5817887 5822817 5826485 5828370 5829905 5831046 5831046 5831046 ## [8,] 4059058 5830199 6000574 6037891 6047551 6051361 6053318 6054913 6056098 6056098 6056098 ## [9,] 4033678 5343311 5483936 5510838 5519655 5523132 5524919 5526374 5527456 5527456 5527456 ## [10,] 4141278 5756610 5878910 5907750 5917202 5920930 5922845 5924405 5925565 5925565 5925565 ## [11,] 3443223 4789529 4891282 4915278 4923142 4926243 4927837 4929134 4930100 4930100 4930100 ``` ] ] .pull-right[ Completed incremental triangle: ```r require(ChainLadder) cum2incr(triangle_completed) ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 4075901 1610076 112556.58 24668.43 10679.546 8840.487 9478.770 6217.266 3468.3641 0 0 ## [2,] 4037070 1461939 62622.23 18517.80 0.000 0.000 0.000 0.000 0.0000 0 0 ## [3,] 4322579 1774121 105850.01 41429.20 24098.625 7030.788 0.000 0.000 0.0000 0 0 ## [4,] 4162489 1616297 104486.62 15982.39 0.000 0.000 0.000 0.000 1155.0830 0 0 ## [5,] 3903230 1605598 160905.57 24194.32 11622.131 0.000 0.000 1502.662 1117.4493 0 0 ## [6,] 4052230 1623940 131721.42 26685.98 14095.805 6261.739 1893.935 1542.505 1147.0778 0 0 ## [7,] 4087125 1593968 96483.79 40310.11 4930.009 3667.976 1884.732 1535.009 1141.5040 0 0 ## [8,] 4059058 1771141 170374.72 37317.06 9660.235 3809.544 1957.475 1594.254 1185.5609 0 0 ## [9,] 4033678 1309634 140624.40 26902.60 8816.985 3477.005 1786.605 1455.090 1082.0723 0 0 ## [10,] 4141278 1615333 122299.67 28840.23 9452.018 3727.432 1915.283 1559.891 1160.0073 0 0 ## [11,] 3443223 1346306 101753.93 23995.21 7864.126 3101.242 1593.525 1297.837 965.1318 0 0 ``` ] ] --- # Mack chain ladder method (continued) <br> The estimated reserve is the sum of the estimates in the .hi-pink[lower half] of the .hi-pink[incremental runoff triangle]. ```r triangle_completed_incr <- cum2incr(triangle_completed) lower_triangle <- row(triangle_completed_incr) + col(triangle_completed_incr) > l+1 reserve_cl <- sum(triangle_completed_incr[lower_triangle]) data.frame(reserve_cl = reserve_cl, reserve_actual = reserve_actual, difference = reserve_cl - reserve_actual, relative_difference_pct = (reserve_cl - reserve_actual) / reserve_actual * 100) ``` ``` ## reserve_cl reserve_actual difference relative_difference_pct ## 1 1734146 2149312 -415165.1 -19.31619 ``` --- # Mack chain ladder method (continued) These calculations are already implemented in the {ChainLadder} package. ```r require(ChainLadder) triangle <- cumulative_triangle(observed_yearly, variable = 'size') MackChainLadder(triangle) ``` .pull-left[ .tiny[ ``` ## MackChainLadder(Triangle = triangle) ## ## Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR) ## 1 5,861,886 1.000 5,861,886 0.00e+00 0 NaN ## 2 5,580,150 1.000 5,580,150 0.00e+00 635 Inf ## 3 6,275,109 1.000 6,275,109 9.31e-10 693 7.44e+11 ## 4 5,899,256 1.000 5,900,411 1.16e+03 2,417 2.09e+00 ## 5 5,705,550 1.000 5,708,170 2.62e+03 4,164 1.59e+00 ## 6 5,854,935 0.999 5,859,518 4.58e+03 6,288 1.37e+00 ## 7 5,822,817 0.999 5,831,046 8.23e+03 7,643 9.29e-01 ## 8 6,037,891 0.997 6,056,098 1.82e+04 11,916 6.54e-01 ## 9 5,483,936 0.992 5,527,456 4.35e+04 14,755 3.39e-01 ## 10 5,756,610 0.971 5,925,565 1.69e+05 40,015 2.37e-01 ## 11 3,443,223 0.698 4,930,100 1.49e+06 126,084 8.48e-02 ## ## Totals ## Latest: 61,721,361.87 ## Dev: 0.97 ## Ultimate: 63,455,508.27 ## IBNR: 1,734,146.40 ## Mack.S.E 136,785.03 ## CV(IBNR): 0.08 ``` ] ] .pull-right[ `Latest`: amount already paid <br> `Ultimate`: estimated total amount (= amount paid + reserve) <br> `IBNR`: estimated total reserve, IBNR + RBNS! <br> `Mack.S.E`: estimated standard deviation of the reserve. ] --- # GLM chain ladder method The chain ladder reserve estimate can also be obtained by assuming a Poisson distribution for the incremental runoff triangle with multiplicative mean, i.e. $$ X_{i,j} \sim \texttt{POI}(\alpha_i \cdot \beta_j), \quad 1 \leq i,j \leq l.$$ .pull-left[ ```r triangle <- incremental_triangle(observed_yearly, variable = 'size') triangle_long <- data.frame( occ.year = as.numeric(row(triangle)), dev.year = as.numeric(col(triangle)), size = as.numeric(triangle)) ``` ] .pull-right[ ```r head(triangle_long) ``` ``` ## occ.year dev.year size ## 1 1 1 4075901 ## 2 2 1 4037070 ## 3 3 1 4322579 ## 4 4 1 4162489 ## 5 5 1 3903230 ## 6 6 1 4052230 ``` ] --- # GLM chain ladder method (continued) The chain ladder reserve estimate can also be obtained by assuming a Poisson distribution for the incremental runoff triangle with multiplicative mean, i.e. $$ X_{i,j} \sim \texttt{POI}(\alpha_i \cdot \beta_j), \quad 1 \leq i,j \leq l.$$ .pull-left[ This Poisson model can be estimated using the `glm` (Generalized Linear Model) routine in R: `$$\log(E(X_{i,j})) = \log(\alpha_i) + \log(\beta_j).$$` ```r fit <- glm(size ~ factor(occ.year) + factor(dev.year), data = triangle_long, family = poisson(link = log)) ``` `factor`: Treat the input as a categorical instead of a numeric variable. ] .pull-right[ ```r summary(fit) ``` .tiny[ ``` ## ## Call: ## glm(formula = size ~ factor(occ.year) + factor(dev.year), family = poisson(link = log), ## data = triangle_long) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -169.664 -55.719 -1.113 42.518 123.909 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.523e+01 4.220e-04 36079.449 <2e-16 *** ## factor(occ.year)2 -4.926e-02 5.914e-04 -83.281 <2e-16 *** ## factor(occ.year)3 6.812e-02 5.744e-04 118.589 <2e-16 *** ## factor(occ.year)4 6.550e-03 5.832e-04 11.232 <2e-16 *** ## factor(occ.year)5 -2.657e-02 5.881e-04 -45.183 <2e-16 *** ## factor(occ.year)6 -4.040e-04 5.843e-04 -0.691 0.489 ## factor(occ.year)7 -5.275e-03 5.851e-04 -9.015 <2e-16 *** ## factor(occ.year)8 3.259e-02 5.799e-04 56.205 <2e-16 *** ## factor(occ.year)9 -5.874e-02 5.943e-04 -98.853 <2e-16 *** ## factor(occ.year)10 1.080e-02 5.873e-04 18.397 <2e-16 *** ## factor(occ.year)11 -1.731e-01 6.845e-04 -252.915 <2e-16 *** ## factor(dev.year)2 -9.390e-01 2.950e-04 -3183.012 <2e-16 *** ## factor(dev.year)3 -3.522e+00 9.734e-04 -3617.713 <2e-16 *** ## factor(dev.year)4 -4.966e+00 2.096e-03 -2369.345 <2e-16 *** ## factor(dev.year)5 -6.082e+00 3.914e-03 -1554.030 <2e-16 *** ## factor(dev.year)6 -7.012e+00 6.724e-03 -1042.827 <2e-16 *** ## factor(dev.year)7 -7.678e+00 1.027e-02 -747.396 <2e-16 *** ## factor(dev.year)8 -7.883e+00 1.268e-02 -621.512 <2e-16 *** ## factor(dev.year)9 -8.180e+00 1.698e-02 -481.669 <2e-16 *** ## factor(dev.year)10 -2.450e+01 4.491e+01 -0.546 0.585 ## factor(dev.year)11 -2.453e+01 6.352e+01 -0.386 0.699 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: 139924691 on 65 degrees of freedom ## Residual deviance: 305443 on 45 degrees of freedom ## (55 observations deleted due to missingness) ## AIC: Inf ## ## Number of Fisher Scoring iterations: 7 ``` ] ] --- # GLM chain ladder method .pull-left[ Predict the cells in the lower triangle ```r lower_triangle <- triangle_long$occ.year + triangle_long$dev.year > l + 1 triangle_long$size[lower_triangle] <- predict(fit, newdata = triangle_long[lower_triangle, ], type = 'response') triangle_long %>% pivot_wider(values_from = size, names_from = dev.year, names_prefix = 'DY.') ``` .tiny[ ``` ## # A tibble: 11 x 12 ## occ.year DY.1 DY.2 DY.3 DY.4 DY.5 DY.6 DY.7 DY.8 DY.9 DY.10 DY.11 ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 4075901. 1610076. 112557. 24668. 10680. 8840. 9479. 6217. 3468. 0 0 ## 2 2 4037070. 1461939. 62622. 18518. 0 0 0 0 0 0 0.0000868 ## 3 3 4322579. 1774121. 105850. 41429. 24099. 7031. 0 0 0 0.000100 0.0000976 ## 4 4 4162489. 1616297. 104487. 15982. 0 0 0 0 1155. 0.0000941 0.0000918 ## 5 5 3903230. 1605598. 160906. 24194. 11622. 0 0 1503. 1117. 0.0000910 0.0000888 ## 6 6 4052230. 1623940. 131721. 26686. 14096. 6262. 1894. 1543. 1147. 0.0000934 0.0000912 ## 7 7 4087125. 1593968. 96484. 40310. 4930. 3668. 1885. 1535. 1142. 0.0000930 0.0000907 ## 8 8 4059058. 1771141. 170375. 37317. 9660. 3810. 1957. 1594. 1186. 0.0000965 0.0000942 ## 9 9 4033678. 1309634. 140624. 26903. 8817. 3477. 1787. 1455. 1082. 0.0000881 0.0000860 ## 10 10 4141278. 1615333. 122300. 28840. 9452. 3727. 1915. 1560. 1160. 0.0000945 0.0000922 ## 11 11 3443223. 1346306. 101754. 23995. 7864. 3101. 1594. 1298. 965. 0.0000786 0.0000767 ``` ] ] .pull-right[ Compute the reserve ```r reserve_glm <- sum(triangle_long$size[lower_triangle]) reserve_glm ``` ``` ## [1] 1734146 ``` ] --- name: yourturn class: clear .left-column[ ## <i class="fa fa-edit"></i> <br> Your turn ] .right-column[ <br> We have demonstrated the estimation of the reserve (future amount paid). In this exercise you will use the same techniques to estimate the number of future development years with at least one payment. Use the yearly data set `reserving_yearly` for this exercise. 1. Compute the actual number of future payments from the unobserved data set. 2. Create a cumulative triangle containing the number of payments per accident and development year. 3. Estimate the future number of payments with the chain ladder method implemented in the {ChainLadder} package. 4. Compute the difference between the estimated and actual number of payments. Express this error in terms of standard deviations. ] --- class: clear .pull-left[ For .hi-pink[Q.1] we compute the actual number of future payments from `unobserved_yearly`. ```r payment_actual <- sum(unobserved_yearly$payment) payment_actual ``` ``` ## [1] 617 ``` For .hi-pink[Q.2] we use `cumulative_triangle` with `variable = 'payment'` to create a payment triangle . ```r triangle <- cumulative_triangle(observed_yearly, variable = 'payment') ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 2054 2577 2594 2600 2602 2603 2604 2605 2606 2606 2606 ## [2,] 2074 2556 2572 2576 2576 2576 2576 2576 2576 2576 NA ## [3,] 2106 2655 2680 2685 2689 2691 2691 2691 2691 NA NA ## [4,] 2033 2565 2589 2594 2594 2594 2594 2594 NA NA NA ## [5,] 2008 2481 2513 2520 2522 2522 2522 NA NA NA NA ## [6,] 2035 2533 2559 2563 2565 2567 NA NA NA NA NA ## [7,] 2028 2515 2532 2537 2538 NA NA NA NA NA NA ## [8,] 2030 2558 2592 2601 NA NA NA NA NA NA NA ## [9,] 2005 2449 2473 NA NA NA NA NA NA NA NA ## [10,] 2100 2607 NA NA NA NA NA NA NA NA NA ## [11,] 1784 NA NA NA NA NA NA NA NA NA NA ``` ] ] .pull-right[ For .hi-pink[Q.3] we appy the chain ladder method to this payment triangle. ```r require(ChainLadder) cl <- MackChainLadder(triangle) cl ``` .tiny[ ``` ## MackChainLadder(Triangle = triangle) ## ## Latest Dev.To.Date Ultimate IBNR Mack.S.E CV(IBNR) ## 1 2,606 1.000 2,606 0.000 0.000 NaN ## 2 2,576 1.000 2,576 0.000 0.121 Inf ## 3 2,691 1.000 2,691 0.000 0.125 Inf ## 4 2,594 1.000 2,594 0.330 0.677 2.0532 ## 5 2,522 1.000 2,523 0.561 0.862 1.5354 ## 6 2,567 1.000 2,568 0.769 0.998 1.2973 ## 7 2,538 0.999 2,540 1.577 1.435 0.9101 ## 8 2,601 0.999 2,604 3.200 2.069 0.6464 ## 9 2,473 0.997 2,481 8.443 2.673 0.3166 ## 10 2,607 0.987 2,640 33.472 7.535 0.2251 ## 11 1,784 0.793 2,250 466.225 28.387 0.0609 ## ## Totals ## Latest: 27,559.00 ## Dev: 0.98 ## Ultimate: 28,073.58 ## IBNR: 514.58 ## Mack.S.E 30.08 ## CV(IBNR): 0.06 ``` ] ] --- class: clear .pull-left[ For .hi-pink[Q.4] we compute the prediction error of the chain ladder method. ```r ultimate <- sum(cum2incr(cl$FullTriangle)) already_paid <- sum(cum2incr(cl$Triangle), na.rm = TRUE) payment_cl <- ultimate - already_paid sigma_cl <- as.numeric(cl$Total.Mack.S.E) error = payment_actual - payment_cl round(c(error = error, pct_error = error / payment_actual * 100, std.dev = error / sigma_cl),2) ``` ``` ## error pct_error std.dev ## 102.42 16.60 3.41 ``` For our data set, the chain ladder method significantly underestimates the number of future payments. ] .pull-right[ Actual triangle: .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 2054 523 17 6 2 1 1 1 1 0 0 ## [2,] 2074 482 16 4 0 0 0 0 0 0 0 ## [3,] 2106 549 25 5 4 2 0 0 0 0 0 ## [4,] 2033 532 24 5 0 0 0 0 0 0 0 ## [5,] 2008 473 32 7 2 0 0 0 0 0 0 ## [6,] 2035 498 26 4 2 2 1 0 0 0 0 ## [7,] 2028 487 17 5 1 1 0 0 0 0 0 ## [8,] 2030 528 34 9 3 1 0 0 0 0 0 ## [9,] 2005 444 24 6 2 0 0 0 0 0 0 ## [10,] 2100 507 19 5 3 1 0 0 0 0 0 ## [11,] 1784 524 39 8 3 1 0 0 0 0 0 ``` ] Predicted triangle: .tiny[ ``` ## dev ## origin 1 2 3 4 5 6 7 8 9 10 11 ## 1 2054 523.0 17.00000 6.000000 2.000000 1.0000000 1.0000000 1.0000000 1.0000000 0 0 ## 2 2074 482.0 16.00000 4.000000 0.000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 ## 3 2106 549.0 25.00000 5.000000 4.000000 2.0000000 0.0000000 0.0000000 0.0000000 0 0 ## 4 2033 532.0 24.00000 5.000000 0.000000 0.0000000 0.0000000 0.0000000 0.3295224 0 0 ## 5 2008 473.0 32.00000 7.000000 2.000000 0.0000000 0.0000000 0.2409938 0.3204066 0 0 ## 6 2035 498.0 26.00000 4.000000 2.000000 2.0000000 0.1976744 0.2453127 0.3261488 0 0 ## 7 2028 487.0 17.00000 5.000000 1.000000 0.8161821 0.1955041 0.2426194 0.3225679 0 0 ## 8 2030 528.0 34.00000 9.000000 1.582905 0.8369510 0.2004790 0.2487932 0.3307761 0 0 ## 9 2005 444.0 24.00000 5.394067 1.508290 0.7974988 0.1910288 0.2370655 0.3151839 0 0 ## 10 2100 507.0 24.48796 5.739759 1.604952 0.8486084 0.2032713 0.2522584 0.3353832 0 0 ## 11 1784 437.7 20.86878 4.891454 1.367749 0.7231888 0.1732290 0.2149761 0.2858155 0 0 ``` ] ] --- class: inverse, center, middle name: chainladderfail # When the chain ladder method fails <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Detecting when the chain ladder method fails The chain ladder method assumes that claims from all occurrence years follow the .hi-pink[same development pattern]. <br> We construct various triangles to assess this assumption. .pull-left[ Number of open claims at the start of the year: ```r triangle_open <- incremental_triangle( observed_yearly %>% mutate(open = calendar_year <= settlement_year), variable = 'open') triangle_open ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 2743 722 28 6 2 1 1 1 1 0 0 ## [2,] 2725 698 24 4 1 0 0 0 0 0 NA ## [3,] 2810 789 35 5 5 2 1 0 0 NA NA ## [4,] 2756 743 33 6 0 0 0 0 NA NA NA ## [5,] 2705 677 46 9 3 0 0 NA NA NA NA ## [6,] 2706 689 34 8 2 2 NA NA NA NA NA ## [7,] 2703 706 27 7 1 NA NA NA NA NA NA ## [8,] 2708 730 45 10 NA NA NA NA NA NA NA ## [9,] 2640 647 27 NA NA NA NA NA NA NA NA ## [10,] 2751 711 NA NA NA NA NA NA NA NA NA ## [11,] 2328 NA NA NA NA NA NA NA NA NA NA ``` ] ] .pull-right[ Number of open claims at the end of the year: ```r triangle_open_end <- incremental_triangle( observed_yearly %>% mutate(open_end = (calendar_year < settlement_year)), variable = 'open_end') triangle_open_end ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 722 28 6 2 1 1 1 1 0 0 0 ## [2,] 698 24 4 1 0 0 0 0 0 0 NA ## [3,] 789 35 5 5 2 1 0 0 0 NA NA ## [4,] 743 33 6 0 0 0 0 0 NA NA NA ## [5,] 677 46 9 3 0 0 0 NA NA NA NA ## [6,] 689 34 8 2 2 1 NA NA NA NA NA ## [7,] 706 27 7 1 1 NA NA NA NA NA NA ## [8,] 730 45 10 4 NA NA NA NA NA NA NA ## [9,] 647 27 7 NA NA NA NA NA NA NA NA ## [10,] 711 30 NA NA NA NA NA NA NA NA NA ## [11,] 644 NA NA NA NA NA NA NA NA NA NA ``` ] ] Less claims occurred during the last accident year, but of those that occurred many are still open. --- # Detecting when the chain ladder method fails The chain ladder method assumes that claims from all occurrence years follow the .hi-pink[same development pattern]. <br> We construct various triangles to assess this assumption: .pull-left[ Percentage of claims open at the start of the year for which a payment is made: ```r triangle_payment <- incremental_triangle( observed_yearly, variable = 'payment') triangle_payment / triangle_open ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 0.7488152 0.7243767 0.6071429 1.0000000 1.0000000 1 1 1 1 NaN NaN ## [2,] 0.7611009 0.6905444 0.6666667 1.0000000 0.0000000 NaN NaN NaN NaN NaN NA ## [3,] 0.7494662 0.6958175 0.7142857 1.0000000 0.8000000 1 0 NaN NaN NA NA ## [4,] 0.7376633 0.7160162 0.7272727 0.8333333 NaN NaN NaN NaN NA NA NA ## [5,] 0.7423290 0.6986706 0.6956522 0.7777778 0.6666667 NaN NaN NA NA NA NA ## [6,] 0.7520325 0.7227866 0.7647059 0.5000000 1.0000000 1 NA NA NA NA NA ## [7,] 0.7502775 0.6898017 0.6296296 0.7142857 1.0000000 NA NA NA NA NA NA ## [8,] 0.7496307 0.7232877 0.7555556 0.9000000 NA NA NA NA NA NA NA ## [9,] 0.7594697 0.6862442 0.8888889 NA NA NA NA NA NA NA NA ## [10,] 0.7633588 0.7130802 NA NA NA NA NA NA NA NA NA ## [11,] 0.7663230 NA NA NA NA NA NA NA NA NA NA ``` ] ] .pull-right[ Average amount paid given a payment is made: ```r triangle_size <- incremental_triangle( observed_yearly, variable = 'size') triangle_size / triangle_payment ``` .tiny[ ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] ## [1,] 1984.372 3078.539 6620.975 4111.405 5339.773 8840.487 9478.77 6217.266 3468.364 NaN NaN ## [2,] 1946.514 3033.069 3913.889 4629.451 NaN NaN NaN NaN NaN NaN NA ## [3,] 2052.507 3231.551 4234.001 8285.841 6024.656 3515.394 NaN NaN NaN NA NA ## [4,] 2047.462 3038.153 4353.609 3196.478 NaN NaN NaN NaN NA NA NA ## [5,] 1943.840 3394.499 5028.299 3456.332 5811.065 NaN NaN NA NA NA NA ## [6,] 1991.268 3260.923 5066.209 6671.495 7047.903 3130.870 NA NA NA NA NA ## [7,] 2015.347 3273.035 5675.517 8062.022 4930.009 NA NA NA NA NA NA ## [8,] 1999.536 3354.434 5011.021 4146.339 NA NA NA NA NA NA NA ## [9,] 2011.809 2949.626 5859.350 NA NA NA NA NA NA NA NA ## [10,] 1972.037 3186.061 NA NA NA NA NA NA NA NA NA ## [11,] 1930.058 NA NA NA NA NA NA NA NA NA NA ``` ] ] --- # Detecting when the chain ladder method fails The chain ladder method assumes that claims from all occurrence years follow the .hi-pink[same development pattern]. <br> We inspect the (granular) daily claims data: .pull-left[ ```r # Select one observation per claim claims <- observed_daily %>% group_by(accident_number) %>% slice(1) %>% ungroup() occ_intensity <- claims %>% group_by(accident_date) %>% summarise(count = n()) require(zoo) occ_intensity$moving_average <- rollmean(occ_intensity$count, 30, na.pad = TRUE) ggplot(occ_intensity) + theme_bw() + geom_point(aes(x = accident_date, y = count)) + geom_line(aes(x = accident_date, y = moving_average), size = 1, color = 'blue') + ggtitle('Evolution of claim frequency') ``` ] .pull-right[ <!-- --> The moving average indicates a period with decreased claim occurrence intensity in 2020. ] --- # Detecting when the chain ladder method fails The chain ladder method assumes that claims from all occurrence years follow the .hi-pink[same development pattern]. <br> We inspect the (granular) daily claims data: .pull-left[ ```r require(lubridate) claims <- claims %>% mutate(start_year = floor_date(accident_date, unit = 'year'), time = as.numeric(accident_date - start_year) / 366, accident_year = year(accident_date), reporting_year = year(reporting_date)) %>% filter(accident_year == reporting_year) ggplot(claims) + theme_bw() + geom_density(aes(x = time, group = factor(accident_year), color = factor(accident_year))) ``` We apply `filter(accident_year == reporting_year)` such that the same amount of information is available for all occurrence years. ] .pull-right[ Density showing claim occurrences within each accident year. The chain ladder method works best when the densities look similar across accident years. <!-- --> ] The occurrence intensity in accident year 2020 deviates from the other accident years! --- # Detecting when the chain ladder method fails The data contains a drop in claim frequency during March, April, May 2020, with .hi-pink[less claims occurring] due to the first .hi-pink[COVID-19 lockdown]. Such changes in the claim occurrence and development pattern can severly impact chain ladder performance. Other situations that might impact the performance of the chain ladder are: .pull-left[ * inflation: * to be detected via the triangle `size` * changing product mix: * to be detected via the triangles `close`, `payment` and `size` * internal changes in claim handling: * to be detected via the triangles `close` and `payment` ] .pull-right[ * extreme weather events: * to be detected via the yearly distribution of occurrence dates * an extremely large claim: * to be detected via the triangle `size`. ] --- # Fixing the chain ladder method .pull-left[ Making the .hi-pink[data more homogeneous] using: * .hi-pink[group] claims by: * claim characteristics, e.g. a separate triangle for fire and water related claims in home insurance. * occurrence time within a calendar year * .hi-pink[omit]: * deviating years/cells, especially if they occurred long ago * .hi-pink[adjust] for: * inflation. The appropriate method(s) depend on the data inspection from the previous slides. ] .pull-right[ Applied to the .hi-pink[COVID-19 reserving data set]: * claim frequency has changed in the months March, April, May * only 2020, i.e. accident year 10, is affected. As 2020 is a recent calendar year, we cannot omit accident year 10 from the data. We illustrate two approaches to improve the homogeneity across the accident years: * applying the chain ladder method to more granular data * grouping claims by occurrence month. ] --- # Monthly chain ladder A more .hi-pink[granular chain ladder] offers more flexibility/parameters to capture fluctuations in the .hi-pink[fluctuations in the occurrence process]. .pull-left[ ```r require(lubridate) claims <- observed_daily %>% group_by(accident_number) %>% slice(1) %>% ungroup() %>% mutate(start_month = floor_date(accident_date, unit = 'month'), time = as.numeric(accident_date - start_month) / 31, accident_month = format(accident_date, '%Y%m'), reporting_month = format(reporting_date, '%Y%m')) %>% filter(accident_month == reporting_month) ggplot(claims) + theme_bw() + geom_density(aes(x = time, group = factor(accident_month), color = factor(accident_month))) + theme(legend.position = 'none') ``` ] .pull-right[ We compare the distribution of claim occurrences within each calendar month. <!-- --> In real (non simulated) data, these monthly densities might be disturbed by seasonal effects, weekends and holidays. ] --- # Monthly chain ladder <br> Example code to construct a .hi-pink[monthly triangle] using the observed daily data: ```r triangle_month <- observed_daily %>% mutate(accident_month = year(accident_date)*12 + month(accident_date) - 2010*12, development_month = year(payment_date)*12 + month(payment_date) - 2010*12 - accident_month) %>% group_by(accident_month, development_month) %>% summarise(size = sum(payment_size)) %>% ungroup() %>% complete(expand.grid(accident_month = 1:132, development_month = 0:131), fill = list(size = 0)) %>% mutate(size = ifelse(accident_month + development_month > 132, NA, size)) %>% arrange(development_month) %>% pivot_wider(names_from = development_month, values_from = size) %>% arrange(accident_month) triangle_month <- as.matrix(triangle_month[, 2:132]) ``` --- # Monthly chain ladder <br> ```r cl <- MackChainLadder(incr2cum(triangle_month)) summary(cl)$Totals ``` ``` ## Totals ## Latest: 6.172136e+07 ## Dev: 9.697826e-01 ## Ultimate: 6.364453e+07 ## IBNR: 1.923173e+06 ## Mack S.E.: 1.035489e+05 ## CV(IBNR): 5.384277e-02 ``` Comparing the yearly and monthly chain ladder: ``` ## method cells reserve.estimate error pct.error ## 1 Yearly chain ladder 66 1734146 -415165.1 -19.31619 ## 2 Monthly chain ladder 8778 1923173 -226139.0 -10.52146 ``` The monthly chain ladder predicts the reserve more accurately on this data set. --- # Grouping claims by occurrence month We group claims based on the occurrence month. For each of these groups we construct a .hi-pink[separate runoff triangle] and estimate the reserve using a .hi-pink[yearly chain ladder]. .pull-left[ ```r require(lubridate) claims <- observed_daily %>% group_by(accident_number) %>% slice(1) %>% ungroup() %>% mutate(start_month = floor_date(accident_date, unit = 'month'), time = as.numeric(accident_date - start_month) / 31, accident_year = format(accident_date, '%Y'), reporting_year = format(reporting_date, '%Y'), month = format(accident_date, '%B')) %>% filter(accident_year == reporting_year) ggplot(claims) + facet_wrap( ~ month, ncol = 3) + theme_bw() + geom_density(aes(x = time, group = factor(accident_year), color = factor(accident_year))) ``` ] .pull-right[ We are comparing occurrences in the same month accross different years. We are no longer comparing different months within one calendar year. <!-- --> ] --- # Grouping claims by occurrence month .pull-left[ Add the accident date to the data set `reserving_yearly` ```r reserving_yearly <- reserving_yearly %>% left_join(reserving_daily %>% group_by(accident_number) %>% slice(1) %>% ungroup() %>% select(accident_number, accident_date)) reserving_yearly <- reserving_yearly %>% mutate(accident_month = format(accident_date, '%B')) ``` Create a runoff triangle for each accident month ```r triangles <- reserving_yearly %>% group_by(accident_month, accident_year, development_year) %>% summarise(size = sum(size)) %>% ungroup() %>% complete(expand.grid(accident_month = unique(accident_month), accident_year = 0:10, development_year = 1:11), fill = list(size = 0)) %>% mutate(size = ifelse(accident_year + development_year > 11, NA, size)) ``` ] .pull-right[ ```r triangles %>% filter(accident_month == 'april') %>% arrange(development_year) %>% pivot_wider(names_from = development_year, values_from = size) ``` .tiny[ ``` ## # A tibble: 11 x 13 ## accident_month accident_year DY.1 DY.2 DY.3 DY.4 DY.5 DY.6 DY.7 DY.8 DY.9 DY.10 DY.11 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 april 0 406625. 70060. 8968. 4503. 2032. 0 0 0 0 0 0 ## 2 april 1 392370. 39237. 0 0 0 0 0 0 0 0 NA ## 3 april 2 490615. 62149. 0 0 0 0 0 0 0 NA NA ## 4 april 3 392533. 30396. 3977. 0 0 0 0 0 NA NA NA ## 5 april 4 426086. 40524. 19473. 5444. 5613. 0 0 NA NA NA NA ## 6 april 5 423923. 43166. 3460. 0 0 0 NA NA NA NA NA ## 7 april 6 411784. 23560. 9287. 0 0 NA NA NA NA NA NA ## 8 april 7 393665. 36667. 0 0 NA NA NA NA NA NA NA ## 9 april 8 394125. 30951. 0 NA NA NA NA NA NA NA NA ## 10 april 9 388587. 24324. NA NA NA NA NA NA NA NA NA ## 11 april 10 202582. NA NA NA NA NA NA NA NA NA NA ``` ] ] --- # Grouping claims by occurrence month (continued) We use the `glm` routine to apply the chain ladder method to each triangle: `factor(development_year) * accident_month` estimates a different effect `factor(development)` for each `accident_month` ```r fit <- glm(size ~ factor(development_year) * accident_month + factor(accident_year) * accident_month, data = triangles, family = poisson(link = 'log')) reserve_group <- sum(predict(fit, newdata = triangles %>% filter(is.na(size)), type = 'response')) reserve_group ``` ``` ## [1] 1928050 ``` We obtain a reserve estimate similar to the one obtained with the monthly chain ladder ``` ## method cells reserve.estimate error pct.error ## 1 Yearly chain ladder 66 1734146 -415165.1 -19.31619 ## 2 Monthly chain ladder 8778 1923173 -226139.0 -10.52146 ## 3 Grouped chain ladder 792 1928050 -221261.9 -10.29455 ``` --- class: inverse, center, middle name: research_outlook # Research outlook <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Research outlook <img src="image/reserving_methods.png" width="67%" style="display: block; margin: auto;" /> --- # Development of an individual claim <br> <img src="image/development_claim.png" width="80%" style="display: block; margin: auto;" /> --- # Development of an individual claim (continued) <img src="image/lexis_claim.png" width="60%" style="display: block; margin: auto;" /> Observed claims are .hi-pink[censored] due to .hi-pink[delays] (reporting, settlement) in the claim development process. .KULbginline[IBNR]: future costs for claims that occurred, but are not yet reported (claim B) <br> .KULbginline[RBNS]: future costs for claims that are reported, but are not yet settled (claim C) <br> .KULbginline[Pricing]: all costs for claims that will occur in .hi-pink[future] insured .hi-pink[exposure periods] (claim D). --- # The IBNR reserve <br> Following a frequency-severity decomposition, the IBNR reserve is the product of the expected .hi-pink[number of unreported claims] times the .hi-pink[expected severity per claim]. Hereby: * insurance pricing covers the estimation of claim severity for new claims * reserving methods focus on predicting the .hi-pink[number of unreported claims]. <br> Our work on the topic: * "Modeling the number of hidden events subject to observation delay". Jonas Crevecoeur, Katrien Antonio and Roel Verbelen. (2019). European Journal of Operational Research. * "Modeling the occurrence of events subject to a reporting delay via an EM algorithm". Roel Verbelen, Katrien Antonio, Gerda Claeskens and Jonas Crevecoeur. (2021). Statistical Science. --- # The IBNR reserve (continued) <br> Number of observed claim occurrences per day <br> <img src="image/ClaimCountDayNoOutlierMonthlyAvg.png" width="80%" style="display: block; margin: auto;" /> --- # The IBNR reserve (continued) <br> Number of reported claims per day <br> <img src="image/ReportCountDayMonthlyAvg.png" width="80%" style="display: block; margin: auto;" /> Almost no claims are reported on Saturdays, Sundays and holidays. --- # The IBNR reserve (continued) <br> Total number of unreported claims on each day, i.e. the number of claims that occur before the evaluation date, but are reported afterwards. <br> <img src="image/TotalIBNRMonthPlusZoom.png" width="70%" style="display: block; margin: auto;" /> The number of unreported claims increases during the weekend (+10% on sunday) and around the end of the year (+30%). --- # The RBNS reserve <br> Predict the future costs of individual, reported claims. <br> Our work on the topic: * "A hierarchical reserving model for non-life insurance claims". Jonas Crevecoeur and Katrien Antonio. (2021). Working paper, under review. * "Bridging the gap between pricing and reserving with an occurrence and development model for non-life insurance claims". Katrien Antonio and Jonas Crevecoeur. (2021). Working paper. <br> Methods implemented in the R package {hirem}, covering tools for hierarchical reserving models, available on [GitHub](https://github.com/jonascrevecoeur/hirem). --- # The RBNS reserve (continued) The hierarchical model is based on .hi-pink[two key ideas]: 1. Model the development of indiviudal claim in discrete time (steps of one year) using the observed development in previous years. <img src="image/model_development.png" width="70%" style="display: block; margin: auto;" /> --- # The RBNS reserve (continued) The hierarchical model is based on .hi-pink[two key ideas]: 1. Model the development of indiviudal claim in discrete time (steps of one year) using the observed development in previous years. 2. Focus on all events registered over the lifetime of a claim Common events registered over the lifetime of a claim: * settlement * payments * initial incurred / changes in the incurred * involvement lawyer. These events are .hi-pink[dependent]: * if a claim is .hi-pink[settled], there will be no .hi-pink[payments] in the future * large .hi-pink[payments] are more likely when the outstanding .hi-pink[reserve] is large. --- # Implementation in the {hirem} package <br> Implementation readily available from the {hirem} package on [GitHub](https://github.com/jonascrevecoeur/hirem). <br> Events are added to the model as .hi-pink[layers]. ```r require(hirem) individual_data <- individual_data %>% mutate(calendar_year = accident_year + development_year - 1, close = (settlement_year == calendar_year)) model <- hirem(individual_data %>% filter(calendar_year <= settlement_year)) %>% layer_glm('close', binomial(link = logit)) %>% layer_glm('payment', binomial(link = logit)) %>% layer_glm('size', Gamma(link = log), filter = function(data){data$payment == 1}) model <- fit(model, close = 'close ~ factor(development_year)', payment = 'payment ~ close + factor(development_year)', size = 'size ~ close + factor(development_year)') ``` --- # Thanks! <br> <br> <br> <br> Slides created with the R package [xaringan](https://github.com/yihui/xaringan). <br> <br> <br> Course material available via <br> <svg style="height:0.8em;top:.04em;position:relative;fill:#116E8A;" viewBox="0 0 496 512"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> https://github.com/katrienantonio/workshop-loss-reserv-fraud