describe() function gives all the statistical information about all the numeric columns in the dataset like count, mean, standard deviation, minimum, maximum, median etc. Lets explore it in more detail.
Consider a Load Prediction dataset. We will try to see all the statistical data of all the numeric variables. We will also calculate mean and median explicitly.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
Step 2: Load the dataset
dataset = pd.read_csv("C:/train_loan_prediction.csv")
Step 3: Calculate mean and median
Just execute the below statement and observe the results yourself:
dataset.describe()
This will provide statistical data of all the numeric columns and discard other non-numeric columns. If you want to know about only a single column like ApplicantIncome, use below statement:
dataset['ApplicantIncome'].describe()
You can explicitly get mean and median using following statements:
dataset['ApplicantIncome'].mean()
dataset['ApplicantIncome'].median()
You can use above mean and median values to impute missing values in the variable.
Consider a Load Prediction dataset. We will try to see all the statistical data of all the numeric variables. We will also calculate mean and median explicitly.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
Step 2: Load the dataset
dataset = pd.read_csv("C:/train_loan_prediction.csv")
Step 3: Calculate mean and median
Just execute the below statement and observe the results yourself:
dataset.describe()
This will provide statistical data of all the numeric columns and discard other non-numeric columns. If you want to know about only a single column like ApplicantIncome, use below statement:
dataset['ApplicantIncome'].describe()
You can explicitly get mean and median using following statements:
dataset['ApplicantIncome'].mean()
dataset['ApplicantIncome'].median()
You can use above mean and median values to impute missing values in the variable.
No comments:
Post a Comment