Zoo Animal Classification

Pranav Kumar
3 min readJan 30, 2021

--

Here we will look at the Zoo Animal Classification.

Classification is used to categorize different objects. It is a supervised problem in machine learning where we have a labeled dataset.

This dataset consists of 101 animals from a zoo.
There are 16 variables with various traits to describe the animals.
The 7 Class Types are: Mammal, Bird, Reptile, Fish, Amphibian, Bug and Invertebrate

First, we will import the data from zoo.data file using read_csv() function. And then will provide attribute names using names parameter of the read_csv() function. The imported data is transferred to dataframe df. A dataframe is a two-dimensional data structure which stores the data in tabular form i.e. rows and columns. And now we will have a look at the data. head() is used to display the first five rows of the dataframe by default.

Shape is used to tell the no. of rows and columns present in the dataset. Describe is used to view some statistical details like percentile, mean, std etc. of a dataframe.

The info() function is used to get a concise summary of the dataframe. The value_counts() returns object containing counts of unique values.

Now, we will use matplotlib and seaborn for visualization.

Lets have a look at the correlation heatmap. It combines the visualization of a heatmap and the information of the correlation matrix in a visually appealing way.

The animal name attribute has been dropped because it’s of no use while training your model.

Here, ‘X’ is my input variable which contains the attributes that are required for training the model. Whereas ‘y’ is my target variable or the desired variable.

For training my model I have imported train_test_split from model_selection from sklearn library and have used naive bayes classification.

The test_size which I have given is 0.2 i.e. 20%. Generally the test_size varies from 20% to 30% and the rest 70% to 80% of the data is used for training the model.

We can have a look at the X_test and y_test data respectively.

The predict_proba function gives us the probabilities for the target in array form.

Thanks for visiting…

You can have a more precise look at the jupyter notebook in my Github repository.

--

--

Pranav Kumar
Pranav Kumar

Written by Pranav Kumar

A young aspirant in the Data Domain .

No responses yet