Thursday, 4 April 2019

Data Exploration: Univariate, Bivariate and Multivariate Analysis

Data Exploration is used to get insights from data. A good data exploration strategy is a key to solve any complicated problem in the world of Machine Learning. 

We can't determine everything by just looking at the data. We need to dig deeper. This step helps us understand the nature of variables (skewed, missing, zero variance feature) so that they can be treated properly. It involves creating charts, graphs (univariate and bivariate analysis), and cross-tables to understand the behavior of features.

A good data exploration strategy comprises the following:

Univariate Analysis - It is used to visualize one variable in one plot. Examples: histogram, density plot, etc.

Bivariate Analysis - It is used to visualize two variables (x and y axis) in one plot. Examples: bar chart, line chart, area chart, etc.

Multivariate Analysis - As the name suggests, it is used to visualize more than two variables at once. Examples: stacked bar chart, dodged bar chart, etc.

Cross Tables -They are used to compare the behavior of two categorical variables (used in pivot tables as well).

Related: Data Exploration using Pandas Library in Python

No comments:

Post a Comment