Lets impute missing values in a variables by taking mode of all the values present in that variable. mode imputes the missing values by a value which occurs most frequently in a variable. We will use scipy library for this.
Consider a Load Prediction dataset. We will impute missing values in Loan_Amount_Term by using mode method.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
from scipy.stats import mode
Step 2: Load the dataset
dataset = pd.read_csv("C:/train_loan_prediction.csv")
Step 3: Impute missing value by mode
We will impute missing values in Loan_Amount_Term variable. Currently there are 14 missing values in this variable. You can confirm this by executing following statement:
dataset['Loan_Amount_Term'].isnull().sum()
Now lets take mode of this variable by using mode function which is present in scipy library.
mode(dataset['Loan_Amount_Term'])
mode(dataset['Loan_Amount_Term']).mode[0]
Output:
ModeResult(mode=array([360.]), count=array([526]))
360.0
It says that most frequent occurring value is 360 and its count is 526.
Lets impute missing values with this value:
dataset['Loan_Amount_Term'].fillna(mode(dataset['Loan_Amount_Term']).mode[0], inplace=True)
Now count the number of missing values in this variable:
dataset['Loan_Amount_Term'].isnull().sum()
It will be zero. So, we have imputed all the missing values with the most frequent value in the variable.
Consider a Load Prediction dataset. We will impute missing values in Loan_Amount_Term by using mode method.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
from scipy.stats import mode
Step 2: Load the dataset
dataset = pd.read_csv("C:/train_loan_prediction.csv")
Step 3: Impute missing value by mode
We will impute missing values in Loan_Amount_Term variable. Currently there are 14 missing values in this variable. You can confirm this by executing following statement:
dataset['Loan_Amount_Term'].isnull().sum()
Now lets take mode of this variable by using mode function which is present in scipy library.
mode(dataset['Loan_Amount_Term'])
mode(dataset['Loan_Amount_Term']).mode[0]
Output:
ModeResult(mode=array([360.]), count=array([526]))
360.0
It says that most frequent occurring value is 360 and its count is 526.
Lets impute missing values with this value:
dataset['Loan_Amount_Term'].fillna(mode(dataset['Loan_Amount_Term']).mode[0], inplace=True)
Now count the number of missing values in this variable:
dataset['Loan_Amount_Term'].isnull().sum()
It will be zero. So, we have imputed all the missing values with the most frequent value in the variable.
No comments:
Post a Comment