Company Bankruptcy Prediction

4 min readFeb 10, 2021

The desire for unlimited power and money often leads to mistakes which leads towards the fall.

Introduction

Bankruptcy is a legal proceeding involving a person or business that is unable to repay their outstanding debts. The bankruptcy process begins with a petition filed by the debtor, which is most common, or on behalf of creditors, which is less common. All of the debtor’s assets are measured and evaluated, and the assets may be used to repay a portion of outstanding debt.

Dataset

The data were collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange.

Company Bankruptcy Prediction with Machine Learning

First, we will import the required libraries. Then, we will import the data from data.csv file using read_csv() function. The imported data is transferred to dataframe df. A dataframe is a two-dimensional data structure which stores the data in tabular form i.e. rows and columns. And now we will have a look at the data. head() is used to display the first five rows of the dataframe by default.

Tail() is used to display the last five rows of the dataframe.

Shape is used to tell the no. of rows and columns present in the dataset. Describe is used to view some statistical details like percentile, mean, std etc. of a dataframe.

The info() function is used to get a concise summary of the dataframe.

Scikit-learn’s algorithms generally cannot be powered by missing data, so we’ll look at the columns to see if there are any attributes that contain missing data. The method isnull().sum() will tell us whether there are any missing values present or not.

Value_counts returns a series containing counts of unique values.

Now we will do some plotting/visualizing our data to understand the relation ship between the numerical features.
I have used both python matplotlib and seaborn library to visualize the data.

A countplot is kind of like a histogram or a bar graph for some categorical area. It simply shows the number of occurrences of an item based on a certain type of category.

Relplot() is used for visualizing statistical relationships using two common approaches: scatter plots and line plots.

Now we will see how these features are correlated to each other using correlation heatmap in seaborn library.

It combines the visualization of a heatmap and the information of the correlation matrix in a visually appealing way.

Here, ‘X’ is my input variable which contains the attributes that are required for training the model. Whereas ‘y’ is my target variable or the desired variable.

For training my model I have imported train_test_split from model_selection from sklearn library and have used Logistic Regression.

The test_size which I have given is 0.2 i.e. 20%. Generally the test_size varies from 20% to 30% and the rest 70% to 80% of the data is used for training the model.