Tuesday, 9 April 2019

Boolean Indexing: How to filter Pandas Data Frame?

We can easily filter out any subset of data from the pandas data frame. We can filter values of a column based on conditions from another set of columns? Boolean indexing is very useful here. 

Consider a Load Prediction dataset. We will filter out the data based on some condition using boolean indexing.

Step 1: Import the required libraries

import pandas as pd
import numpy as np

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Filter data using boolean indexing

Suppose we want a list of all females who are not graduate and got a loan. Lets use boolean indexing to filter out the data. You can use the following code:

dataset.loc[(dataset["Gender"]=="Female") & (dataset["Education"]=="Not Graduate") & (dataset["Loan_Status"]=="Y"), ["Gender","Education","Loan_Status"]]

Above code selects the data showing all the females who are not graduate and their loan status is approved. It will only display three columns "Gender", "Education" and "Loan_Status". You can display n number of columns based on your requirement. Please try other conditions to filter out the data for the sake of practice.

No comments:

Post a Comment