3 Quick Ways to Create Graphs of Your Class Distributions in Python

by Seth Bunke · March 29, 2018

Whenever we are building machine learning models we always have to evaluate the data. For classification problems, at the very least, we need to look at the class distributions to see how balanced or unbalanced our training set is. Here are 3 quick ways to use Python to visualize the class distributions (each less than 10 lines of code):

Python Setup Code (so we have data to look at):

import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Use the line below to show inline in a notebook
%matplotlib inline

#Generate random integers for class IDs, then find 
#the unique ones and their counts
y_values = [random.randint(0,20) for _ in range(101)]
unique, counts = np.unique(y_values, return_counts=True)
y_train = y_values[:80]
y_valid = y_values[81:]

Quick but “Fugly”

pd.DataFrame(unique)[0].hist()

Class distribution graph

Quick and a Little Easier on the Eyes

plt.bar(unique, counts, 1)
plt.title('Class Frequency')
plt.xlabel('Class')
plt.ylabel('Frequency')
plt.show()

Class distribution graph with labels

Even Better (both training and validation data are shown)

unique, counts = np.unique(y_train, return_counts=True)
plt.bar(unique, counts)
unique, counts = np.unique(y_valid, return_counts=True)
plt.bar(unique, counts)

plt.title('Class Frequency')
plt.xlabel('Class')
plt.ylabel('Frequency')

plt.show()

Class distribution graph – training and validation data

Here are some of my other articles on Pandas.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

3 Quick Ways to Create Graphs of Your Class Distributions in Python

Python Setup Code (so we have data to look at):

Quick but “Fugly”

Quick and a Little Easier on the Eyes

Even Better (both training and validation data are shown)

Like this:

Related

You may also like...

Leave a Reply Cancel reply

Search

Categories

Meta

3 Quick Ways to Create Graphs of Your Class Distributions in Python

Python Setup Code (so we have data to look at):

Quick but “Fugly”

Quick and a Little Easier on the Eyes

Even Better (both training and validation data are shown)

Share this:

Like this:

Related

You may also like...

Udacity Computer Vision Preview

Simple Way to Restart Python Flask on File Change During Development

Using Keras and TensorFlow in Kaggle Competition to Classify Satellite Data

Leave a Reply Cancel reply

Search

Tags

Categories

Meta