Home

Exploratory Data Analysis (EDA)

|
Updated:  
2016-03-26 07:28:17
|
Statistics for Big Data For Dummies
Explore Book
Buy On Amazon

Before you apply statistical techniques to a dataset, it's important to examine the data to understand its basic properties. You can use a series of techniques that are collectively known as Exploratory Data Analysis (EDA) to analyze a dataset. EDA helps ensure that you choose the correct statistical techniques to analyze and forecast the data. The two basic types of EDA techniques are graphical techniques and quantitative techniques.

Graphical EDA techniques

Graphical EDA techniques show the key properties of a dataset in a convenient format. It's often easier to understand the properties of a variable and the relationships between variables by looking at graphs rather than looking at the raw data. You can use several graphical techniques, depending on the type of data being analyzed. You use the following:

  • Box plots

  • Histograms

  • Normal probability plots

  • Scatter plots

Quantitative EDA techniques

Quantitative EDA techniques provide a more rigorous method of determining the key properties of a dataset. Two of the most important of these techniques are

  • Interval estimation.

  • Hypothesis testing.

Interval estimates are used to create a range of values within which a variable is likely to fall. Hypothesis testing is used to test various propositions about a dataset, such as

  • The mean value of the dataset.

  • The standard deviation of the dataset.

  • The probability distribution the dataset follows.

Hypothesis testing is a core technique in statistics.

About This Article

This article is from the book: 

About the book author:

Alan Anderson, PhD is a teacher of finance, economics, statistics, and math at Fordham and Fairfield universities as well as at Manhattanville and Purchase colleges. Outside of the academic environment he has many years of experience working as an economist, risk manager, and fixed income analyst. Alan received his PhD in economics from Fordham University, and an M.S. in financial engineering from Polytechnic University.

David Semmelroth has two decades of experience translating customer data into actionable insights across the financial services, travel, and entertainment industries. David has consulted for Cedar Fair, Wachovia, National City, and TD Bank.