-
Defining Business Objectives
The project starts with using a well-defined business objective. The model is supposed to address a business question. Clearly stating that objective will allow you to define the scope of your project, and will provide you with the exact test to measure its success.
-
Preparing Data
You’ll use historical data to train your model. The data is usually scattered across multiple sources and may require cleansing and preparation. Data may contain duplicate records and outliers; depending on the analysis and the business objective, you decide whether to keep or remove them. Also, the data could have missing values, may need to undergo some transformation, and may be used to generate derived attributes that have more predictive power for your objective. Overall, the quality of the data indicates the quality of the model.
-
Sampling Your Data
You’ll need to split your data into two sets: training and test datasets. You build the model using the training dataset. You use the test data set to verify the accuracy of the model’s output. Doing so is absolutely crucial. Otherwise you run the risk of overfitting your model — training the model with a limited dataset, to the point that it picks all the characteristics (both the signal and the noise) that are only true for that particular dataset. An model that’s overfitted for a specific data set will perform miserably when you run it on other datasets. A test dataset ensures a valid way to accurately measure your model’s performance.
-
Building the Model
Sometimes the data or the business objectives lend themselves to a specific algorithm or model. Other times the best approach is not so clear-cut. As you explore the data, run as many algorithms as you can; compare their outputs. Base your choice of the final model on the overall results. Sometimes you’re better off running an ensemble of models simultaneously on the data and choosing a final model by comparing their outputs.
-
Deploying the Model
After building the model, you have to deploy it in order to reap its benefits. That process may require co-ordination with other departments. Aim at building a deployable model. Also be sure you know how to present your results to the business stakeholders in an understandable and convincing way so they adopt your model. After the model is deployed, you’ll need to monitor its performance and continue improving it. Most models decay after a certain period of time. Keep your model up to date by refreshing it with newly available data.