Data Exploration and Visualization is the first step in the process of creating a robust Machine Learning model. We need to understand and explore the data using various graphs and plots present in matplotlib and seaborn libraries. This step takes a lot of time and patience.
Plots and graphs help us to analyze relationships among various variables present in the dataset. We can visualize and analyze missing values, outliers, skewed data, correlation among variables etc.
Main Python libraries used in data exploration and visualization are pandas, matplotlib and seaborn.
There are mainly three types of plots: Univariate, Bivariate and Multivariate Analysis
Some commonly used plots and graphs are: Joint Plot, Distribution Plot, Box Plot, Bar Plot, Regression Plot, Strip Plot, Heatmap, Violin Plot, Pair Plot and Grid, Facet Grid.
Visualize missing values
Visualize missing values in Bar Plot using Seaborn Library
Visualize outliers
What are Outliers? How to find and remove outliers using JointPlot in Seaborn Library?
What is Boxplot? How is it used to find outliers in a dataset?
How to visualize outliers in categorical variables using boxplots?
Visualize skewed data
What is Skewnesss? How to visualize it with Histogram and how to remove it?
How to visualize skewness of numeric variables by plotting histograms?
Log Transforming the Skewed Data to get Normal Distribution
Visualize correlation among variables
How to find Correlation Score and plot Correlation Heatmap using Seaborn Library in Python?
Other links
Boxplot Grouping: Visualizing one variable based on another variable using boxplot
Plots and graphs help us to analyze relationships among various variables present in the dataset. We can visualize and analyze missing values, outliers, skewed data, correlation among variables etc.
Main Python libraries used in data exploration and visualization are pandas, matplotlib and seaborn.
There are mainly three types of plots: Univariate, Bivariate and Multivariate Analysis
Some commonly used plots and graphs are: Joint Plot, Distribution Plot, Box Plot, Bar Plot, Regression Plot, Strip Plot, Heatmap, Violin Plot, Pair Plot and Grid, Facet Grid.
Visualize missing values
Visualize missing values in Bar Plot using Seaborn Library
Visualize outliers
What are Outliers? How to find and remove outliers using JointPlot in Seaborn Library?
What is Boxplot? How is it used to find outliers in a dataset?
How to visualize outliers in categorical variables using boxplots?
Visualize skewed data
What is Skewnesss? How to visualize it with Histogram and how to remove it?
How to visualize skewness of numeric variables by plotting histograms?
Log Transforming the Skewed Data to get Normal Distribution
Visualize correlation among variables
How to find Correlation Score and plot Correlation Heatmap using Seaborn Library in Python?
Other links
Boxplot Grouping: Visualizing one variable based on another variable using boxplot