Credibility theory features of actuar

Similar documents
Navigate to the golf data folder and make it your working directory. Load the data by typing

Insurance. The practical application. of credibility models in the rating of Health

Nonlife Actuarial Models. Chapter 7 Bühlmann Credibility

Setting up group models Part 1 NITP, 2011

Guidelines for Applying Multilevel Modeling to the NSCAW Data

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Legendre et al Appendices and Supplements, p. 1

Evaluating NBA Shooting Ability using Shot Location

GLMM standardisation of the commercial abalone CPUE for Zones A-D over the period

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

TRAINING COURSE Dam behaviour Analysis with DamBASE software

AMERICAN ACADEMY OF ACTUARIES MEDICARE SUPPLEMENT WORK GROUP SUBTEAM REPORT ON LOSS RATIO CURVES FOR REDETERMINATION OF REFUND BENCHMARKS

Special Topics: Data Science

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

Correlation and regression using the Lahman database for baseball Michael Lopez, Skidmore College

Bhagwant N. Persaud* Richard A. Retting Craig Lyon* Anne T. McCartt. May *Consultant to the Insurance Institute for Highway Safety

Journal of Quantitative Analysis in Sports Manuscript 1039

Robust specification testing in regression: the FRESET test and autocorrelated disturbances

Appendix for: Product Differentiation and Mergers in the Carbonated Soft Drink Industry

A Combined Recruitment Index for Demersal Juvenile Cod in NAFO Divisions 3K and 3L

A Novel Approach to Predicting the Results of NBA Matches

Predicting the Total Number of Points Scored in NFL Games

Simulating Major League Baseball Games

MICROSIMULATION USING FOR CAPACITY ANALYSIS OF ROUNDABOUTS IN REAL CONDITIONS

Measuring Returns to Scale in Nineteenth-Century French Industry Technical Appendix

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

2016 American Academy of Actuaries. All rights reserved. May not be reproduced without express permission. ADDITIONAL LIABILITY ASSUMPTION OVERVIEW

7 th International Conference on Wind Turbine Noise Rotterdam 2 nd to 5 th May 2017

Automated Design-Based Compaction Criteria for Airport Pavements Using FAARFIELD

Mathematics of Pari-Mutuel Wagering

Lab 11: Introduction to Linear Regression

CENTER PIVOT EVALUATION AND DESIGN

Package STI. August 19, Index 7. Compute the Standardized Temperature Index

Building an NFL performance metric

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

A Traffic Operations Method for Assessing Automobile and Bicycle Shared Roadways

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia

16. Studio ScaleChem Calculations

An Analysis of Factors Contributing to Wins in the National Hockey League

Name May 3, 2007 Math Probability and Statistics

Verification of Peening Intensity

Bayesian Methods: Naïve Bayes

Implementing Provisions for Art. 411 of the ICR Ski Jumping

Incompressible Potential Flow. Panel Methods (3)

Massey Method. Introduction. The Process

E. Agu, M. Kasperski Ruhr-University Bochum Department of Civil and Environmental Engineering Sciences

Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 AUDIT TRAIL

Using Spatio-Temporal Data To Create A Shot Probability Model

Package mrchmadness. April 9, 2017

ALBERTA AUTOMOBILE INSURANCE RATE BOARD

Taking Your Class for a Walk, Randomly

Introduction to Roundabout Analysis Using ARCADY

Estimating Safety by the Empirical Bayes Method: A Tutorial.

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie

Evaluation and Improvement of the Roundabouts

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Gamblers Favor Skewness, Not Risk: Further Evidence from United States Lottery Games

AGA Swiss McMahon Pairing Protocol Standards

Swinburne Research Bank

A IMPROVED VOGEL S APPROXIMATIO METHOD FOR THE TRA SPORTATIO PROBLEM. Serdar Korukoğlu 1 and Serkan Ballı 2.

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

COST ESTIMATION. Fully Allocated Causal Factor Models Temporal Variation Models Incremental Fixed Variable Cost Models

HAP e-help. Obtaining Consistent Results Using HAP and the ASHRAE 62MZ Ventilation Rate Procedure Spreadsheet. Introduction

EXPLORING MOTIVATION AND TOURIST TYPOLOGY: THE CASE OF KOREAN GOLF TOURISTS TRAVELLING IN THE ASIA PACIFIC. Jae Hak Kim

A point-based Bayesian hierarchical model to predict the outcome of tennis matches

First Steps Towards the AEI 10m Prototype Single Arm Test Auto Alignment

Package jackstraw. August 7, 2018

2 When Some or All Labels are Missing: The EM Algorithm

Value of time, safety and environment in passenger transport Time

DISCUSSION BY CHARLES C. HEWITT, JR.

A SPEED-FLOW MODEL FOR AUSTRIAN -MOTORWAYS

This file is part of the following reference:

ANNUAL REVIEW OF INDUSTRY EXPERIENCE - FINAL REPORT AS OF DECEMBER 31, 2016 COMMERCIAL VEHICLES ALBERTA AUTO INSURANCE RATE BOARD 29 SEPTEMBER 2017

Random Coefficient Models for Time-Series Cross-Section Data: Monte Carlo Experiments of Finite Sample Properties

Two Machine Learning Approaches to Understand the NBA Data

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2

Dynamic validation of Globwave SAR wave spectra data using an observation-based swell model. R. Husson and F. Collard

International Journal of Technical Research and Applications e-issn: , Volume 4, Issue 3 (May-June, 2016), PP.

Measuring Heterogeneous Traffic Density

A statistical model of Boy Scout disc golf skills by Steve West December 17, 2006

Wave Load Pattern Definition

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)

A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS

REDUCING THE LIKELIHOOD OF LONG TENNIS MATCHES

A Cost Effective and Efficient Way to Assess Trail Conditions: A New Sampling Approach

Behavior under Social Pressure: Empty Italian Stadiums and Referee Bias

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

Recovery of sea level fields of the last decades from altimetry and tide gauge data

Actuarial Analysis for Industry-Wide Rate Adjustment November 1, 2007

Analysis of Shear Lag in Steel Angle Connectors

A Case Study of Leadership in Women s Intercollegiate Softball. By: DIANE L. GILL and JEAN L. PERRY

1 Introduction. 2 EAD and Derived Factors

IBIS File : Problems & Software Solution

POWER Quantifying Correction Curve Uncertainty Through Empirical Methods

Iteration: while, for, do while, Reading Input with Sentinels and User-defined Functions

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings

Computationally Efficient Determination of Long Term Extreme Out-of-Plane Loads for Offshore Turbines

Transcription:

Credibility theory features of actuar Christophe Dutang Université du Mans Vincent Goulet Université Laval Xavier Milhaud Université Claude Bernard Lyon 1 Mathieu Pigeon Université du Québec à Montréal 1 Introduction Credibility models are actuarial tools to distribute premiums fairly among a heterogeneous group of policyholders (henceforth called entities). More generally, they can be seen as prediction methods applicable in any setting where repeated measures are made for subjects with different risk levels. The credibility theory features of actuar consist of matrix hachemeister containing the famous data set of Hachemeister (1975) and function cm to fit hierarchical (including Bühlmann, Bühlmann-Straub) and regression credibility models. Furthermore, function simul can simulate portfolios of data satisfying the assumptions of the aforementioned credibility models; see the "simulation" vignette for details. 2 Hachemeister data set The data set of Hachemeister (1975) consists of private passenger bodily injury insurance average claim amounts, and the corresponding number of claims, for five U.S. states over 12 quarters between July 1970 and June 1973. The data set is included in the package in the form of a matrix with 5 rows and 25 columns. The first column contains a state index, columns 2 13 contain the claim averages and columns 14 25 contain the claim numbers: 1

\begin{sinput} > data(hachemeister) > hachemeister state ratio.1 ratio.2 ratio.3 ratio.4 ratio.5 ratio.6 [1,] 1 1738 1642 1794 2051 2079 2234 [2,] 2 1364 1408 1597 1444 1342 1675 [3,] 3 1759 1685 1479 1763 1674 2103 [4,] 4 1223 1146 1010 1257 1426 1532 [5,] 5 1456 1499 1609 1741 1482 1572 ratio.7 ratio.8 ratio.9 ratio.10 ratio.11 ratio.12 [1,] 2032 2035 2115 2262 2267 2517 [2,] 1470 1448 1464 1831 1612 1471 [3,] 1502 1622 1828 2155 2233 2059 [4,] 1953 1123 1343 1243 1762 1306 [5,] 1606 1735 1607 1573 1613 1690 weight.1 weight.2 weight.3 weight.4 weight.5 weight.6 [1,] 7861 9251 8706 8575 7917 8263 [2,] 1622 1742 1523 1515 1622 1602 [3,] 1147 1357 1329 1204 998 1077 [4,] 407 396 348 341 315 328 [5,] 2902 3172 3046 3068 2693 2910 weight.7 weight.8 weight.9 weight.10 weight.11 [1,] 9456 8003 7365 7832 7849 [2,] 1964 1515 1527 1748 1654 [3,] 1277 1218 896 1003 1108 [4,] 352 331 287 384 321 [5,] 3275 2697 2663 3017 3242 weight.12 [1,] 9077 [2,] 1861 [3,] 1121 [4,] 342 [5,] 3425 3 Hierarchical credibility model The linear model fitting function of R is named lm. Since credibility models are very close in many respects to linear models, and since the credibility model fitting function of actuar borrows much of its interface from lm, we named the credibility function cm. Function cm acts as a unified interface for all credibility models supported by the package. Currently, these are the unidimensional models of Bühlmann 2

(1969) and Bühlmann and Straub (1970), the hierarchical model of Jewell (1975) (of which the first two are special cases) and the regression model of Hachemeister (1975), optionally with the intercept at the barycenter of time (Bühlmann and Gisler, 2005, Section 8.4). The modular design of cm makes it easy to add new models if desired. This subsection concentrates on usage of cm for hierarchical models. There are some variations in the formulas of the hierarchical model in the literature. We compute the credibility premiums as given in Bühlmann and Jewell (1987) or Bühlmann and Gisler (2005). We support three types of estimators of the between variance structure parameters: the unbiased estimators of Bühlmann and Gisler (2005) (the default), the slightly different version of Ohlsson (2005) and the iterative pseudo-estimators as found in Goovaerts and Hoogstad (1987) or Goulet (1998). Consider an insurance portfolio where contracts are classified into cohorts. In our terminology, this is a two-level hierarchical classification structure. The observations are claim amounts S ijt, where index i = 1,..., I identifies the cohort, index j = 1,..., J i identifies the contract within the cohort and index t = 1,..., n ij identifies the period (usually a year). To each data point corresponds a weight or volume w ijt. Then, the best linear prediction for the next period outcome of a contract based on ratios X ijt = S ijt /w ijt is ˆπ ij = z ij X ijw + (1 z ij ) ˆπ i ˆπ i = z i X izw + (1 z i )m (1) with the credibility factors z ij = z i = and the weighted averages The estimator of s 2 is n w ij ijσ w ijkσ + s 2 /a, w ijσ = w ijt t=1 J z i iσ z iσ + a/b, z iσ = z ij j=1 X ijw = X izw = n ij w ijt X w ijt t=1 ijσ J i z ij X z ijw. j=1 iσ ŝ 2 = 1 I i=1 J i j=1 (n ij 1) I i=1 J i j=1 n ij t=1 w ijt (X ijt X ijw ) 2. (2) 3

The three types of estimators for parameters a and b are the following. First, let A i = B = J i w ijσ (X ijw X iww ) 2 (J i 1)s 2 j=1 I z iσ (X izw X zzw ) 2 (I 1)a i=1 c i = w iσσ d = z ΣΣ J i w 2 ijσ w j=1 iσσ I i=1 z 2 iσ z ΣΣ, with X zzw = I z iσ X z izw. (3) i=1 ΣΣ (Hence, E[A i ] = c i a and E[B] = db.) Then, the Bühlmann Gisler estimators are â = 1 I ( ) Ai I max, 0 (4) c i=1 i ( ) B ˆb = max d, 0, (5) the Ohlsson estimators are â = I i=1 A i I i=1 c i ˆb = B d (6) (7) and the iterative (pseudo-)estimators are where 1 ã = i=1 I (J i 1) I i=1 J i j=1 z ij (X ijw X izw ) 2 (8) b = 1 I I 1 z i (X izw X zzw ) 2, (9) i=1 X zzw = I z i X z izw. (10) i=1 Σ Note the difference between the two weighted averages (3) and (10). See Belhadj et al. (2009) for further discussion on this topic. Finally, the estimator of the collective mean m is ˆm = X zzw. The credibility modeling function cm assumes that data is available in the format most practical applications would use, namely a rectangular array (matrix or data frame) with entity observations in the rows and with one or more 4

classification index columns (numeric or character). One will recognize the output format of simul and its summary methods. Then, function cm works much the same as lm. It takes in argument: a formula of the form terms describing the hierarchical interactions in a data set; the data set containing the variables referenced in the formula; the names of the columns where the ratios and the weights are to be found in the data set. The latter should contain at least two nodes in each level and more than one period of experience for at least one entity. Missing values are represented by NAs. There can be entities with no experience (complete lines of NAs). In order to give an easily reproducible example, we group states 1 and 3 of the Hachemeister data set into one cohort and states 2, 4 and 5 into another. This shows that data does not have to be sorted by level. The fitted model using the iterative estimators is: \begin{sinput} > X <- cbind(cohort = c(1, 2, 1, 2, 2), hachemeister) > fit <- cm(~cohort + cohort:state, data = X, + ratios = ratio.1:ratio.12, + weights = weight.1:weight.12, + method = "iterative") > fit Call: cm(formula = ~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12, weights = weight.1:weight.12, method = "iterative") Structure Parameters Estimators Collective premium: 1746 Between cohort variance: 88981 Within cohort/between state variance: 10952 Within state variance: 139120026 The function returns a fitted model object of class "cm" containing the estimators of the structure parameters. To compute the credibility premiums, one calls a method of predict for this class: 5

\begin{sinput} > predict(fit) $cohort [1] 1949 1543 $state [1] 2048 1524 1875 1497 1585 One can also obtain a nicely formatted view of the most important results with a call to summary: 6

\begin{sinput} > summary(fit) Call: cm(formula = ~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12, weights = weight.1:weight.12, method = "iterative") Structure Parameters Estimators Collective premium: 1746 Between cohort variance: 88981 Within cohort/between state variance: 10952 Within state variance: 139120026 Detailed premiums Level: cohort cohort Indiv. mean Weight Cred. factor Cred. premium 1 1967 1.407 0.9196 1949 2 1528 1.596 0.9284 1543 Level: state cohort state Indiv. mean Weight Cred. factor 1 1 2061 100155 0.8874 2 2 1511 19895 0.6103 1 3 1806 13735 0.5195 2 4 1353 4152 0.2463 2 5 1600 36110 0.7398 Cred. premium 2048 1524 1875 1497 1585 The methods of predict and summary can both report for a subset of the levels by means of an argument levels. For example: 7

\begin{sinput} > summary(fit, levels = "cohort") Call: cm(formula = ~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12, weights = weight.1:weight.12, method = "iterative") Structure Parameters Estimators Collective premium: 1746 Between cohort variance: 88981 Within cohort variance: 10952 Detailed premiums Level: cohort cohort Indiv. mean Weight Cred. factor Cred. premium 1 1967 1.407 0.9196 1949 2 1528 1.596 0.9284 1543 \begin{sinput} > predict(fit, levels = "cohort") $cohort [1] 1949 1543 The results above differ from those of Goovaerts and Hoogstad (1987) for the same example because the formulas for the credibility premiums are different. 4 Bühlmann and Bühlmann Straub models As mentioned above, the Bühlmann and Bühlmann Straub models are simply one-level hierarchical models. In this case, the Bühlmann Gisler and Ohlsson estimators of the between variance parameters are both identical to the usual Bühlmann and Straub (1970) estimator ( ) w I ΣΣ â = w 2 ΣΣ i=1 I w iσ (X iw X ww ) 2 (I 1)ŝ 2, (11) w2 iσ i=1 8

and the iterative estimator ã = 1 I I 1 z i (X iw X zw ) 2 (12) i=1 is better known as the Bichsel Straub estimator. To fit the Bühlmann model using cm, one simply does not specify any weights: \begin{sinput} > cm(~state, hachemeister, ratios = ratio.1:ratio.12) Call: cm(formula = ~state, data = hachemeister, ratios = ratio.1:ratio.12) Structure Parameters Estimators Collective premium: 1671 Between state variance: 72310 Within state variance: 46040 In comparison, the results for the Bühlmann Straub model using the Bichsel Straub estimator are: \begin{sinput} > cm(~state, hachemeister, ratios = ratio.1:ratio.12, + weights = weight.1:weight.12) Call: cm(formula = ~state, data = hachemeister, ratios = ratio.1:ratio.12, weights = weight.1:weight.12) Structure Parameters Estimators Collective premium: 1684 Between state variance: 89639 Within state variance: 139120026 9

5 Regression model of Hachemeister The regression model of Hachemeister (1975) is a generalization of the Bühlmann Straub model. If data shows a systematic trend, the latter model will typically under- or over-estimate the true premium of an entity. The idea of Hachemeister was to fit to the data a regression model where the parameters are a credibility weighted average of an entity s regression parameters and the group s parameters. In order to use cm to fit a credibility regression model to a data set, one simply has to supply as additional arguments regformula and regdata. The first one is a formula of the form terms describing the regression model and the second is a data frame of regressors. That is, arguments regformula and regdata are in every respect equivalent to arguments formula and data of lm, with the minor difference that regformula does not need to have a left hand side (and is ignored if present). For example, fitting the model X it = β 0 + β 1 t + ε t, t = 1,..., 12 to the original data set of Hachemeister (1975) is done with \begin{sinput} > fit <- cm(~state, hachemeister, regformula = ~ time, + regdata = data.frame(time = 1:12), + ratios = ratio.1:ratio.12, + weights = weight.1:weight.12) > fit Call: cm(formula = ~state, data = hachemeister, ratios = ratio.1:ratio.12, weights = weight.1:weight.12, regformula = ~time, regdata = data.frame(time = 1:12)) Structure Parameters Estimators Collective premium: 1469 32.05 Between state variance: 24154 2700.0 2700 301.8 Within state variance: 49870187 Computing the credibility premiums requires to give the future values of the regressors as in predict.lm: 10

1000 1200 1400 1600 1800 2000 collective individual credibility 2 4 6 8 10 12 Figure 1: Collective, individual and credibility regression lines for State 4 of the Hachemeister data set. The point indicates the credibility premium. \begin{sinput} > predict(fit, newdata = data.frame(time = 13)) [1] 2437 1651 2073 1507 1759 It is well known that the basic regression model has a major drawback: there is no guarantee that the credibility regression line will lie between the collective and individual ones. This may lead to grossly inadequate premiums, as Figure 1 shows. The solution proposed by Bühlmann and Gisler (1997) is simply to position the intercept at the barycenter of time instead of at time origin (see also Bühlmann and Gisler, 2005, Section 8.4). In mathematical terms, this essentially amounts to using an orthogonal design matrix. By setting the argument adj.intercept to TRUE in the call, cm will automatically fit the credibility regression model with the intercept at the barycenter of time. The resulting regression coefficients have little meaning, but the predictions are sensible: 11

\begin{sinput} > fit2 <- cm(~state, hachemeister, regformula = ~ time, + regdata = data.frame(time = 1:12), + adj.intercept = TRUE, + ratios = ratio.1:ratio.12, + weights = weight.1:weight.12) > summary(fit2, newdata = data.frame(time = 13)) Call: cm(formula = ~state, data = hachemeister, ratios = ratio.1:ratio.12, weights = weight.1:weight.12, regformula = ~time, regdata = data.frame(time = 1:12), adj.intercept = TRUE) Structure Parameters Estimators Collective premium: -1675 117.1 Between state variance: 93783 0 0 8046 Within state variance: 49870187 Detailed premiums Level: state state Indiv. coef. Credibility matrix Adj. coef. 1-2062.46 0.9947 0.0000-2060.41 216.97 0.0000 0.9413 211.10 2-1509.28 0.9740 0.0000-1513.59 59.60 0.0000 0.7630 73.23 3-1813.41 0.9627 0.0000-1808.25 150.60 0.0000 0.6885 140.16 4-1356.75 0.8865 0.0000-1392.88 96.70 0.0000 0.4080 108.77 5-1598.79 0.9855 0.0000-1599.89 41.29 0.0000 0.8559 52.22 Cred. premium 2457 1651 2071 1597 1698 12

1000 1200 1400 1600 1800 2000 collective individual credibility 2 4 6 8 10 12 Figure 2: Collective, individual and credibility regression lines for State 4 of the Hachemeister data set when the intercept is positioned at the barycenter of time. The point indicates the credibility premium. Figure 2 shows the beneficient effect of the intercept adjustment on the premium of State 4. References H. Belhadj, V. Goulet, and T. Ouellet. On parameter estimation in hierarchical credibility. ASTIN Bulletin, 39(2), 2009. H. Bühlmann. Experience rating and credibility. ASTIN Bulletin, 5:157 165, 1969. H. Bühlmann and A. Gisler. Credibility in the regression case revisited. ASTIN Bulletin, 27:83 98, 1997. H. Bühlmann and A. Gisler. A course in credibility theory and its applications. Springer, 2005. ISBN 3-5402575-3-5. 13

H. Bühlmann and W. S. Jewell. Hierarchical credibility revisited. Bulletin of the Swiss Association of Actuaries, 87:35 54, 1987. H. Bühlmann and E. Straub. Glaubgwürdigkeit für Schadensätze. Bulletin of the Swiss Association of Actuaries, 70:111 133, 1970. M. J. Goovaerts and W. J. Hoogstad. Credibility theory. Number 4 in Surveys of actuarial studies. Nationale-Nederlanden N.V., Netherlands, 1987. V. Goulet. Principles and application of credibility theory. Journal of Actuarial Practice, 6:5 62, 1998. C. A. Hachemeister. Credibility for regression models with application to trend. In Credibility, theory and applications, Proceedings of the berkeley Actuarial Research Conference on Credibility, pages 129 163, New York, 1975. Academic Press. W. S. Jewell. The use of collateral data in credibility theory: a hierarchical model. Giornale dell Istituto Italiano degli Attuari, 38:1 16, 1975. E. Ohlsson. Simplified estimation of structure parameters in hierarchical credibility. Presented at the Zurich ASTIN Colloquium, 2005. URL http: //www.actuaries.org/astin/colloquia/zurich/ohlsson.pdf. 14