The 9 laws of data mining: a reference guide
Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining” to guide new data miners as they get down to work. This reference guide shows you what each of these laws means to your everyday work.
-
1st Law of Data Mining, or “Business Goals Law”: Business objectives are the origin of every data mining solution.
A data miner is someone who discovers useful information from data to support specific business goals. Data mining isn’t defined by the tool you use.
-
2nd Law of Data Mining, or “Business Knowledge Law”: Business Knowledge is central to every step of the data mining process.
You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works.
-
3rd Law of Data Mining or “Data Preparation Law”: Data preparation is more than half of every data mining process.
Pretty much every data miner will spend more time on data preparation than on analysis.
-
4th Law of Data Mining, or “No Free Lunch for the Data Miner”: The right model for a given application can only be discovered by experiment.
In data mining, models are selected through trial and error.
-
5th Law of Data Mining: There are always patterns in the data.
As a data miner, you explore data in search of useful patterns. Understanding patterns in the data enables you to influence what happens in the future.
-
6th Law of Data Mining, or “Insight Law”: Data mining amplifies perception in the business domain.
Data mining methods enable you to understand your business better than you could have done without them.
-
7th Law of Data Mining or “Prediction Law”: Prediction increases information locally by generalization.
Data mining helps us use what we know to make better predictions (or estimates) of things we don’t know.
-
8th Law of Data Mining, or “Value Law”: The value of data mining results is not determined by the accuracy or stability of predictive models.
Your model must produce good predictions, consistently. That’s it.
-
9th Law of Data Mining, or “Law of Change”: All patterns are subject to change.
Any model that gives you great predictions today may be useless tomorrow.
Phases of the data mining process
The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It’s an open standard; anyone may use it. The following list describes the various phases of the process.
-
Business understanding: Get a clear understanding of the problem you’re out to solve, how it impacts your organization, and your goals for addressing it. Tasks in this phase include:
-
Identifying your business goals
-
Assessing your situation
-
Defining your data mining goals
-
Producing your project plan
-
-
Data understanding: Review the data that you have, document it, identify data management and data quality issues. Tasks for this phase include:
-
Gathering data
-
Describing
-
Exploring
-
Verifying quality
-
-
Data preparation: Get your data ready to use for modeling. Tasks for this phase include:
-
Selecting data
-
Cleaning data
-
Constructing
-
Integrating
-
Formatting
-
-
Modeling: Use mathematical techniques to identify patterns within your data. Tasks for this phase include:
-
Selecting techniques
-
Designing tests
-
Building models
-
Assessing models
-
-
Evaluation: Review the patterns you have discovered and assess their potential for business use. Tasks for this phase include:
-
Evaluating results
-
Reviewing the process
-
Determining the next steps
-
-
Deployment: Put your discoveries to work in everyday business. Tasks for this phase include:
-
Planning deployment (your methods for integrating data mining discoveries into use)
-
Reporting final results
-
Reviewing final results
-