**value_counts()**function is present in pandas library and is very useful in Data Wrangling step. It is used to analyze frequency distribution of values in a variable by plotting frequency table.

**value_counts()**function returns object containing

**counts of unique values**. The resulting object will be in

**descending order**so that the first element is the most frequently-occurring element.

**Excludes NA values by default.**

So, pandas value_counts() function is very useful in imputing the missing values.

Consider a Load Prediction dataset. We will try to impute missing values in

**Self_Employed**variable.

**Step 1: Import the required libraries**

import pandas as pd

import numpy as np

**Step 2: Load the dataset**

dataset = pd.read_csv("C:/train_loan_prediction.csv")

dataset.shape

Output: (614, 13)

So, this dataset has

**614**observations. Self_Employed column contains either "Yes" or "No". Lets see how many missing values are there in this columns:

dataset["Self_Employed"].isnull().sum()

Output:

**32**

So, this column contains 32 missing observations (out of total 614 observations). Now lets use pandas

**value_counts()**function to calculate number of "Yes" and number of "No".

dataset['Self_Employed']

**.value_counts()**

Output:

**No 500**

**Yes 82**

We can clearly observe that out of 582 observations, there are 500 "No" values which is around 86%. So if we easily impute "No" in the 32 missing values.

dataset['Self_Employed'].

**fillna**('No', inplace=True)

So, in this way, we can use value_counts to impute missing values.

## No comments:

## Post a Comment