Pages

Monday, 8 April 2019

How to view and change datatypes of variables or features in a dataset?

astype function is used to change the datatypes of the variables in pandas dataframe. When we load a data in pandas dataframe, datatype is automatically assigned to each variable based on the data it contains.

Consider a Load Prediction dataset. We will try to change the datatype of Credit_History variable.

Step 1: Import the required libraries

import pandas as pd
import numpy as np

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Find datatype of all the variables

dataset.info()
dataset.dtypes

Step 4: Change the datatype of a variable

We observe that Credit_History is a nominal variable (categorical variable), even then it is being identified as float64 as it contains numbers. But, ideally it should be of object type as its a categorical variable.

So, we can change the datatype of this variable using following Python code:

dataset['Credit_History'] = dataset['Credit_History'].astype(np.object)

Now print the datatypes and examine the results.

dataset.info()
dataset.dtypes

You will see the datatype of Credit_History has been changed from float64 to object.

No comments:

Post a Comment