Pages

Monday, 8 April 2019

Sorting datasets based on multiple columns using sort_values

You can easily sort your dataset based on single or multiple columns. First you need to load your data into pandas dataframe and then use sort_values function to sort the dataset.

Consider a Load Prediction dataset. We will try to sort this dataset based upon Applicant and Coapplicant income in both ascending and descending order.

Step 1: Import the required libraries

import pandas as pd
import numpy as np

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Sort the dataset in ascending and descending order

Lets sort our dataset based on the Applicant Income and Coapplicant Income.

dataset_sorted = dataset.sort_values(['ApplicantIncome','CoapplicantIncome'])
dataset_sorted[['ApplicantIncome','CoapplicantIncome']].head(50)

Above code will sort the dataset based on multiple columns (ApplicantIncome and CoapplicantIncome) in ascending order. 

If you want to sort the dataset in descending order, just add ascending=False in sort_values function like this:

dataset_sorted = dataset.sort_values(['ApplicantIncome','CoapplicantIncome'], ascending=False)
dataset_sorted[['ApplicantIncome','CoapplicantIncome']].head(50)

Now it will show the data in descending order. Data of applicants with higher income is displayed first.

No comments:

Post a Comment