Before applying Machine Learning algorithms to the dataset, we need to standardize and transform the features in the dataset. Why is it required? Lets try to understand by an example:

There is one Employee dataset. It contains features like Employee Age and Employee Salary. Now AGE feature contains values on the scale 25-60 and SALARY contains values on the scale 10000-100000. As these two features are different in scale, these need to be standardized to have common scale while building Machine Learning models. Difference of scale is very large which can adversely impact the algorithm performance. So, we need to standardize these features.

The idea behind Standardization is that it will transform your data such that its distribution will have a mean value 0 and standard deviation of 1.

Given the distribution of the data,

First of all, you need to import StandardScaler:

Assuming you have split your dataset like this:

Transform your X_train and X_test features like this

There is one Employee dataset. It contains features like Employee Age and Employee Salary. Now AGE feature contains values on the scale 25-60 and SALARY contains values on the scale 10000-100000. As these two features are different in scale, these need to be standardized to have common scale while building Machine Learning models. Difference of scale is very large which can adversely impact the algorithm performance. So, we need to standardize these features.

The idea behind Standardization is that it will transform your data such that its distribution will have a mean value 0 and standard deviation of 1.

**MEAN = 0****STANDARD DEVIATION = 1**Given the distribution of the data,

**each value in the dataset will have the sample mean value subtracted, and then divided by the standard deviation of the whole dataset**.**How to implement Standardization using Scikit Learn Library in Python?****StandardScaler**performs the task of**Standardization**.First of all, you need to import StandardScaler:

**from sklearn.preprocessing import StandardScaler**Assuming you have split your dataset like this:

**X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)**Transform your X_train and X_test features like this

**scaler = StandardScaler()****X_train = scaler.fit_transform(X_train)****X_test = scaler.transform(X_test)**
## No comments:

## Post a Comment