###
Frequency Table: How to use pandas value_counts() function to impute missing values?

**value_counts()** function is present in pandas library and is very useful in Data Wrangling step. It is used to analyze frequency distribution of values in a variable by plotting frequency table.

**value_counts()** function returns object containing **counts of unique values**. The resulting object will be in **descending order** so that the first element is the most frequently-occurring element. **Excludes NA values by default.**

So, pandas value_counts() function is very useful in imputing the missing values.

Consider a Load Prediction dataset. We will try to impute missing values in **Self_Employed **variable.

**Step 1: Import the required libraries**

import pandas as pd
import numpy as np

**Step 2: Load the dataset**

dataset = pd.read_csv("C:/train_loan_prediction.csv")
dataset.shape
Output: (614, 13)

So, this dataset has **614 **observations. Self_Employed column contains either "Yes" or "No". Lets see how many missing values are there in this columns:

dataset["Self_Employed"].isnull().sum()
Output: **32**

So, this column contains 32 missing observations (out of total 614 observations). Now lets use pandas **value_counts() **function to calculate number of "Yes" and number of "No".

dataset['Self_Employed']**.value_counts()**
Output:
**No 500**
**Yes 82**

We can clearly observe that out of 582 observations, there are 500 "No" values which is around 86%. So if we easily impute "No" in the 32 missing values.

dataset['Self_Employed'].**fillna**('No', inplace=True)

So, in this way, we can use value_counts to impute missing values.
## No comments:

## Post a Comment