Heart Failure Prediction

Pranav Kumar
4 min readFeb 7, 2021

Protect your heart

Image Source : Google Images

Introduction

Heart disease describes a range of conditions that affect your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease, heart rhythm problems (arrhythmias) and heart defects you’re born with (congenital heart defects), among others.

Dataset

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide.
Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

Heart Failure Prediction with Machine Learning

First, we will import the required libraries. Then, we will import the data from heart_failure_clinical_records_dataset.csv file using read_csv() function. The imported data is transferred to dataframe df. A dataframe is a two-dimensional data structure which stores the data in tabular form i.e. rows and columns. And now we will have a look at the data. head() is used to display the first five rows of the dataframe by default.

Shape is used to tell the no. of rows and columns present in the dataset. Describe is used to view some statistical details like percentile, mean, std etc. of a dataframe.

The info() function is used to get a concise summary of the dataframe. Value_counts returns a series containing counts of unique values.

Now we will do some plotting/visualizing our data to understand the relation ship between the numerical features.
I have used both python matplotlib and seaborn library to visualize the data.

A countplot is kind of like a histogram or a bar graph for some categorical area. It simply shows the number of occurrences of an item based on a certain type of category.

Pairplot plots pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each numeric variable in data will by shared across the y-axes across a single row and the x-axes across a single column.

Now we will see how these features are correlated to each other using correlation heatmap in seaborn library.

It combines the visualization of a heatmap and the information of the correlation matrix in a visually appealing way.

Here, ‘X’ is my input variable which contains the attributes that are required for training the model. Whereas ‘y’ is my target variable or the desired variable.

For training my model I have imported train_test_split from model_selection from sklearn library and have used Logistic Regression.

The test_size which I have given is 0.2 i.e. 20%. Generally the test_size varies from 20% to 30% and the rest 70% to 80% of the data is used for training the model.

Thanks for visiting…

You can have a more precise look at the jupyter notebook in my Github repository.

--

--