Tuesday, 9 April 2019

How to calculate Mean and Median of numeric variables using Pandas library?

describe() function gives all the statistical information about all the numeric columns in the dataset like count, mean, standard deviation, minimum, maximum, median etc. Lets explore it in more detail.

Consider a Load Prediction dataset. We will try to see all the statistical data of all the numeric variables. We will also calculate mean and median explicitly.

Step 1: Import the required libraries

import pandas as pd
import numpy as np

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Calculate mean and median

Just execute the below statement and observe the results yourself:

dataset.describe()

This will provide statistical data of all the numeric columns and discard other non-numeric columns. If you want to know about only a single column like ApplicantIncome, use below statement:

dataset['ApplicantIncome'].describe()

You can explicitly get mean and median using following statements:

dataset['ApplicantIncome'].mean()
dataset['ApplicantIncome'].median()

You can use above mean and median values to impute missing values in the variable.

No comments:

Post a Comment