Pages

Monday 8 April 2019

How to find mode of a variable using Scipy library to impute missing values?

Lets impute missing values in a variables by taking mode of all the values present in that variable. mode imputes the missing values by a value which occurs most frequently in a variable. We will use scipy library for this.

Consider a Load Prediction dataset. We will impute missing values in Loan_Amount_Term by using mode method.

Step 1: Import the required libraries

import pandas as pd
import numpy as np
from scipy.stats import mode

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Impute missing value by mode

We will impute missing values in Loan_Amount_Term variable. Currently there are 14 missing values in this variable. You can confirm this by executing following statement:

dataset['Loan_Amount_Term'].isnull().sum()

Now lets take mode of this variable by using mode function which is present in scipy library.

mode(dataset['Loan_Amount_Term'])
mode(dataset['Loan_Amount_Term']).mode[0]

Output: 
ModeResult(mode=array([360.]), count=array([526]))
360.0

It says that most frequent occurring value is 360 and its count is 526.

Lets impute missing values with this value:

dataset['Loan_Amount_Term'].fillna(mode(dataset['Loan_Amount_Term']).mode[0], inplace=True)

Now count the number of missing values in this variable:

dataset['Loan_Amount_Term'].isnull().sum()

It will be zero. So, we have imputed all the missing values with the most frequent value in the variable.

No comments:

Post a Comment

About the Author

I have more than 10 years of experience in IT industry. Linkedin Profile

I am currently messing up with neural networks in deep learning. I am learning Python, TensorFlow and Keras.

Author: I am an author of a book on deep learning.

Quiz: I run an online quiz on machine learning and deep learning.