Wednesday, 10 April 2019

How to encode and transform all the categorical variables to numeric variables using LabelEncoder?

Machine Learning algorithms require all inputs to be numeric, so we should convert all our categorical variables into numeric variables by encoding the categories. Before that, please make sure that you have imputed all the missing values in all the categorical variables. We will use LabelEncoder which is present in Scikit Learn library to encode and transform categorical variables.

Consider a Load Prediction dataset. We will encode and transform all the categorical variables to numeric variables.

Step 1: Import the required libraries

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Encode categorical variables using LabelEncoder

Categorical variables are Gender, Married, Dependents, Education, Self_Employed, Property_Area, Loan_Status. Lets encode and transform all these categorical variables to numeric variables in one go using following Python code.

categorical_vars = ['Gender','Married','Dependents','Education','Self_Employed','Property_Area','Loan_Status']
label_encoder = LabelEncoder()
for i in categorical_vars:
    dataset[i] = label_encoder.fit_transform(dataset[i])

Now, look at the datatypes of variables:

dataset.dtypes 

You will see that datatype of all the categorical variables has been changed from object to other datatypes like int32, float64 etc. So, now our dataset is ready for Machine Leaning algorithms.

Related: Difference between Label Encoder and One Hot Encoder

No comments:

Post a Comment