insurancerating provides actuarial building blocks for
insurance pricing in R.
A common GLM-based pricing exercise often combines several tasks:
This vignette illustrates one way to combine the main building blocks:
factor_analysis()glm()rating_table()model_performance() and
bootstrap_performance()The focus is on the transition from portfolio data to an interpretable tariff structure.
We use the example dataset MTPL2, which contains a motor
portfolio with:
nclaims),exposure),premium),amount),
library(insurancerating)
library(dplyr)
head(MTPL2)
#> # A tibble: 6 × 6
#> customer_id area nclaims amount exposure premium
#> <int> <int> <int> <int> <dbl> <int>
#> 1 92617 2 0 0 1 90
#> 2 120632 2 0 0 1 82
#> 3 147800 2 0 0 1 47
#> 4 29763 3 0 0 0.0630 44
#> 5 61107 1 1 6066 1 69
#> 6 4318 3 0 0 1 66A pricing analysis often starts with an analysis of the portfolio.
Before fitting a model, it is necessary to understand:
This is done with factor_analysis().
We start by analysing a single risk factor.
fa <- factor_analysis(
MTPL,
risk_factors = "zip",
claim_count = "nclaims",
exposure = "exposure",
claim_amount = "amount"
)
fa
#> zip amount nclaims exposure frequency average_severity risk_premium
#> 1 1 116178669 1593 11080.6274 0.1437644 72930.74 10484.846
#> 2 2 59751985 1008 7782.6301 0.1295192 59277.76 7677.608
#> 3 3 58988962 1038 7587.5644 0.1368028 56829.44 7774.427
#> 4 0 821510 29 206.8438 0.1402024 28327.93 3971.644The output provides commonly used portfolio metrics such as:
This provides a direct view of:
At this stage, the purpose is not yet to fit a model, but to understand whether the factor behaves in a way that is suitable for pricing.
Continuous variables are typically not used directly in a tariff. In pricing practice, they are usually:
This ensures that the final tariff remains interpretable and implementable.
age_freq <- risk_factor_gam(
data = MTPL,
risk_factor = "age_policyholder",
claim_count = "nclaims",
exposure = "exposure"
)
autoplot(age_freq, show_observations = TRUE)This step is used to inspect:
This converts the continuous variable into risk-homogeneous tariff segments.
The resulting segments should reflect differences in risk, while remaining suitable for use in a tariff.
dat <- MTPL |>
add_tariff_segments(age_segments, name = "age_cat") |>
mutate(across(where(is.character), as.factor)) |>
mutate(across(where(is.factor), ~ set_reference_level(., exposure)))set_reference_level() sets the reference level to the
level with the highest exposure. In pricing models, this is often the
most stable and interpretable baseline.
Generalized linear models are widely used in insurance pricing because they:
A common decomposition is:
rt <- rating_table(burn_unrestricted)
rt
#> level risk_factor est_burn_unrestricted exposure
#> 1 (Intercept) (Intercept) 9370.4023322 NA
#> 2 0 zip 0.9946246 207
#> 3 1 zip 1.0000000 11081
#> 4 2 zip 1.0049888 7783
#> 5 3 zip 1.0028308 7588
#> 6 [18,25] age_cat 2.3041459 1331
#> 7 (25,32] age_cat 2.4813038 3649
#> 8 (32,39] age_cat 0.9246871 4247
#> 9 (39,51] age_cat 1.0000000 7421
#> 10 (51,58] age_cat 0.5699965 3245
#> 11 (58,65] age_cat 0.5798450 2791
#> 12 (65,84] age_cat 0.7103948 3901
#> 13 (84,95] age_cat 0.5190330 72rating_table() expresses fitted coefficients in terms of
the original factor levels, including the reference level.
This output is commonly used to inspect tariff relativities.
This plot is typically used to assess:
At this stage, the relevant questions are:
model_performance(mod_freq)
#> # Comparison of Model Performance Indices
#>
#> Model | AIC | BIC | RMSE
#> ---------+----------+-----------+------
#> mod_freq | 22949.04 | 23015.512 | 0.362This provides summary measures of model fit, such as RMSE.
This provides a view of predictive stability by evaluating how performance changes across bootstrap samples.
A single fit statistic is usually not sufficient. In pricing practice, it is also relevant to assess whether the model behaves consistently under small data perturbations.
At this point, the example has produced:
In many cases, a further step is required before the model output can be used as a tariff.
Typical reasons include:
This can be handled with the refinement tools described in Refinement building blocks.
A possible sequence in insurancerating is:
factor_analysis() # analyse portfolio behaviour
risk_factor_gam() # analyse continuous variables
derive_tariff_segments() # derive tariff segments
glm() # estimate pricing models
rating_table() # interpret fitted coefficients
bootstrap_performance() # assess stability
prepare_refinement() # refine tariff structure if neededThe aim is to move from raw portfolio data to a tariff structure that is:
The following vignette covers the refinement step in more detail:
For the conceptual background to exposure, risk premium, and tariff design, see: