---
title: "Pricing workflow building blocks"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Pricing workflow building blocks}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

`insurancerating` provides building blocks for common actuarial pricing tasks in
GLM-based tariff analysis. The package does not prescribe a single pricing
method. Instead, it supports practical steps that often appear in insurance
pricing work: portfolio analysis, model interpretation, tariff refinement and
model validation.

This vignette gives a compact overview of those building blocks and how they
can be combined.

```{r}
library(insurancerating)
```

## 1. Start with portfolio experience

A pricing analysis often starts by checking how the observed portfolio behaves
by risk factor. This is useful before modelling, but also later when reviewing
whether fitted relativities are plausible.

`factor_analysis()` summarises exposure, claim frequency, average severity,
risk premium and related metrics by one or more risk factors.

```{r}
fa <- factor_analysis(
  MTPL,
  risk_factors = "zip",
  claim_count = "nclaims",
  claim_amount = "amount",
  exposure = "exposure"
)

head(fa)
```

The output helps answer practical questions such as:

- where exposure is concentrated
- whether observed differences are credible or noisy
- whether a segment is driven by a small number of claims
- which risk factors may need closer modelling or refinement

For numeric variables with long or skewed tails, `outlier_histogram()` can help
inspect extreme observations before fitting severity models or constructing
tariff segments.

```{r}
outlier_histogram(
  MTPL2,
  x = "premium",
  upper = 100,
  density = FALSE
)
```

## 2. Assess large losses

Large claims can dominate severity and pure premium analysis. In capped
severity workflows, it is often useful to assess a cap first, decompose the
historical claim amounts, and then decide how the excess burden should be
allocated. A low threshold increases pricing responsiveness but introduces
volatility. A high threshold improves stability but may understate structural
differences between segments.

```{r, eval = FALSE}
thresholds <- assess_excess_threshold(
  claims,
  claim_amount = "claim_amount",
  thresholds = c(50000, 100000, 150000),
  exposure = "earned_exposure",
  group = "sector"
)

autoplot(thresholds, y = "premium_impact")
```

After choosing a threshold, `calculate_excess_loss()` creates a deterministic
historical decomposition. It does not bootstrap or allocate anything.

```{r, eval = FALSE}
excess <- calculate_excess_loss(
  claims,
  claim_amount = "claim_amount",
  threshold = 100000
)
```

The allocation step is where sharing and uncertainty are handled. Portfolio
allocation is stable but ignores group experience. Risk-factor allocation is
responsive but can be volatile. Partial allocation balances portfolio stability, group
responsiveness and the credibility of observed excess experience.

```{r, eval = FALSE}
allocation <- allocate_excess_loss(
  excess,
  excess_amount = "excess_claim_amount",
  allocation_weight = "earned_exposure",
  risk_factor = "sector",
  allocation = "partial",
  preserve_total_excess = TRUE
)

summary(allocation, compare_to_empirical = TRUE)
autoplot(allocation, y = "allocated_loading")
autoplot(allocation, y = "credibility")
```

In the allocation output, `allocated_excess_loss` is the absolute monetary
burden assigned to a row. `allocated_loading` is the corresponding loading per
unit of the chosen weight, such as earned exposure. This distinction matters
when the output is added back to pricing data.

The allocated loading can then be added to the pricing data.

```{r, eval = FALSE}
excess$base_premium <- excess$technical_premium
priced <- apply_excess_loading(
  excess,
  allocation,
  base_premium = "base_premium"
)
```

This excess loading is part of the technical risk premium. It is not intended
as a commercial margin.

## 3. Translate continuous factors into tariff segments

Many tariffs use grouped versions of continuous variables such as age, vehicle
age or insured value. `risk_factor_gam()` can be used to inspect the fitted
shape of a continuous risk factor. `derive_tariff_segments()` can then derive
candidate segment boundaries from that pattern.

```{r}
age_gam <- risk_factor_gam(
  data = MTPL,
  claim_count = "nclaims",
  risk_factor = "age_policyholder",
  exposure = "exposure"
)

age_segments <- derive_tariff_segments(age_gam)
age_segments
```

The derived segments can be added back to the portfolio with
`add_tariff_segments()`.

```{r}
portfolio <- MTPL |>
  add_tariff_segments(age_segments, name = "age_policyholder_segment")

head(portfolio[, c("age_policyholder", "age_policyholder_segment")])
```

These functions are intended to support actuarial judgement, not replace it.
Candidate segment boundaries should still be reviewed for credibility,
stability and practical usability.

## 4. Fit and interpret a GLM

GLMs are widely used in insurance pricing because they provide an interpretable
multiplicative structure. After fitting a model, `rating_table()` expresses the
coefficients in tariff-table form.

```{r}
portfolio$zip <- as.factor(portfolio$zip)

freq_model <- glm(
  nclaims ~ zip + age_policyholder_segment + offset(log(exposure)),
  family = poisson(),
  data = portfolio
)

rt <- rating_table(
  freq_model,
  model_data = portfolio,
  exposure = "exposure"
)

head(rt$df)
```

Observed portfolio experience from `factor_analysis()` can be attached to the
rating table with `add_observed_experience()`. This makes the comparison between
model relativities and observed experience explicit.

```{r}
zip_experience <- factor_analysis(
  portfolio,
  risk_factors = "zip",
  claim_count = "nclaims",
  exposure = "exposure"
)

rt |>
  add_observed_experience(zip_experience, metric = "frequency") |>
  autoplot(risk_factors = "zip")
```

## 5. Refine tariff effects when needed

Raw model output may be statistically valid but still unsuitable for direct
tariff use. Sparse levels, noisy estimates or non-monotonic adjacent effects can
make a tariff hard to explain or maintain.

The refinement workflow makes these adjustments explicit:

```{r, eval = FALSE}
refined_model <- prepare_refinement(freq_model) |>
  add_smoothing(
    model_variable = "age_policyholder_segment",
    source_variable = "age_policyholder",
    weights = "exposure"
  ) |>
  add_restriction(restrictions) |>
  refit()
```

Common refinement tasks include:

- smoothing adjacent tariff levels
- fixing selected coefficients to actuarial or commercial assumptions
- applying sublevel relativities within a broader GLM factor level
- refitting the model while preserving the intended tariff structure

These tools are most useful when the statistical model already captures the main
risk structure and the remaining work is tariff refinement.

## 6. Validate model behaviour

Pricing models should be checked before their output is used in a tariff.
`insurancerating` contains helpers for several common checks:

- `check_overdispersion()` for Poisson frequency models
- `check_residuals()` for simulation-based residual diagnostics using DHARMa
- `bootstrap_performance()` for predictive stability with metrics such as RMSE
- `rating_grid()` to inspect observed rating-grid combinations

For example:

```{r}
check_overdispersion(freq_model)
```

```{r, eval = FALSE}
check_residuals(freq_model) |>
  autoplot()
```

Validation does not make a tariff decision by itself. It gives evidence about
model fit, stability and areas that may need further review.

## Typical workflow

One possible workflow is:

1. Inspect the portfolio with `factor_analysis()` and `outlier_histogram()`.
2. Assess large-loss thresholds with `assess_excess_threshold()` where capped
   severity or excess-loss loadings are relevant.
3. Decompose and allocate excess loss with `calculate_excess_loss()` and
   `allocate_excess_loss()`.
4. Analyse continuous risk factors with `risk_factor_gam()`.
5. Create candidate tariff segments with `derive_tariff_segments()`.
6. Fit GLMs for frequency, severity or pure premium.
7. Interpret coefficients with `rating_table()`.
8. Compare fitted relativities with observed experience using
   `add_observed_experience()`.
9. Apply refinement where needed with `prepare_refinement()`, `add_smoothing()`,
   `add_restriction()` or `add_relativities()`.
10. Validate the resulting model with the model-performance helpers.

The exact order and choice of functions depends on the portfolio, product,
data quality and pricing objective.

## Next steps

For a worked example, see:

- [Getting started](getting-started.html)

For coefficient refinement:

- [Refinement building blocks](refinement-workflow.html)

For validation:

- [Model validation](model-validation.html)