Insurance Claim Payment Prediction

Pranav Kumar
3 min readFeb 5, 2021

Isolation is my COVID-19 insurance policy.

Swedish Insurance Dataset

The dataset is called the “Auto Insurance in Sweden” dataset and involves predicting the total payment for all the claims in thousands of Swedish Kronor (y) given the total number of claims (x).

This means that for a new number of claims (x) we will be able to predict the total payment of claims (y).

Insurance Claim Payment Prediction with Machine Learning

First, we will import the required libraries. Then, we will import the data from Insurance.csv file using read_csv() function and provide attribute names using names parameter of the read_csv() function. The imported data is transferred to dataframe df. A dataframe is a two-dimensional data structure which stores the data in tabular form i.e. rows and columns. And now we will have a look at the data. head() is used to display the first five rows of the dataframe by default.

Shape is used to tell the no. of rows and columns present in the dataset. Describe is used to view some statistical details like percentile, mean, std etc. of a dataframe.

The info() function is used to get a concise summary of the dataframe. Scikit-learn’s algorithms generally cannot be powered by missing data, so we’ll look at the columns to see if there are any attributes that contain missing data.

Now we will do some plotting/visualizing our data to understand the relation ship between the numerical features.
I have used both python matplotlib and seaborn library to visualize the data.

Now we will see how these features are correlated to each other using correlation heatmap in seaborn library.

It combines the visualization of a heatmap and the information of the correlation matrix in a visually appealing way.

Here, ‘X’ is my input variable which contains the attributes that are required for training the model. Whereas ‘y’ is my target variable or the desired variable.

For training my model I have imported train_test_split from model_selection from sklearn library and have used Linear Regression.

I have also used AdaBoostRegressor. AdaBoost is a meta-algorithm, which means it can be used together with other algorithms for performance improvement. Indeed, the concept of boosting is a type of linear regression.

The test_size which I have given is 0.2 i.e. 20%. Generally the test_size varies from 20% to 30% and the rest 70% to 80% of the data is used for training the model.

Thanks for visiting…

You can have a more precise look at the jupyter notebook in my Github repository.

--

--