Pages

Sunday, 7 April 2019

How to use Pandas Lambda Functions for Data Wrangling?

Pandas Lambda Functions are very handy in Data Wrangling step. We will see how to apply lambda functions on pandas dataframe? Sometimes, we need to create very short functions that we do not intend to use multiple times, naming those functions may not be necessary. So, instead of creating full-fledged functions, we can just create lambda functions.

Consider a Load Prediction dataset. We will count number of missing and unique values in each variable in this dataset using pandas lambda functions.

Step 1: Import the required libraries

import pandas as pd
import numpy as np

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Find out missing values

To find out all the missing values in each variable, we use following code:

dataset.isnull().sum()

We can also use lambda function to achieve this. Above code can also be written like this:

dataset.apply(lambda x: sum(x.isnull()))

Let’s try to understand above lambda function:

lambda — this is a lambda function
x: — the parameter name within the function
sum(x.isnull()) — what to do with the parameter

This might appear strange pattern in the first go, but you will enjoy writing lambda functions once you get comfortable with it. Lets practice it more with one more example. Lets try to find out how many unique values are there in each variable.

dataset.apply(lambda x: len(x.unique()))

The above code will go through each and every variable one by one and display the count of unique values in each and every column. For example, for "Education" column, it will display "2" as there are two unique categores (Graduate and Not Graduate).

No comments:

Post a Comment