- For customer segmentation and/or community detection in the social sphere, for example, you'd need clustering algorithms.
- For customer retention or to develop a recommender system, you'd use classification algorithms.
- For credit scoring or predicting the next outcome of time-driven events, you'd use a regression algorithm.
Some predictive analytics projects succeed best by building an ensemble model, a group of models that operate on the same data. An ensemble model uses a predefined mechanism to gather outcomes from all its component models and provide a final outcome for the user.
Models can take various forms — a query, a collection of scenarios, a decision tree, or an advanced mathematical analysis. In addition, certain models work best for certain data and analyses. You can (for example) use classification algorithms that employ decision rules to decide the outcome of a given scenario or transaction, addressing questions like these:
- Is this customer likely to respond to our marketing campaign?
- Is this money-transfer likely to be part of a money-laundering scheme?
- Is this loan applicant likely to default on the loan?
Regression algorithms can be used to forecast continuous data, such as predicting the trend for a stock movement given its past prices.
Decision trees, support vector machines, neural networks, logistic, and linear regressions are some of the most common algorithms. Although their mathematical implementations differ, these predictive models generate comparable results. The decision trees are more popular, because they're easy to understand; you can follow the path to a given decision.
Classification algorithms are great for the type of analysis when the target is known (such as identifying spam emails). On the other hand, when the target variable is unknown, clustering algorithms are your best bet. They allow you to cluster or group your data into meaningful groups based on the similarities among the group members.
These algorithms are widely popular. There are many tools, both commercial and open-source, that implement them. With data accumulation thriving and accelerating (that is, big data), and cost-efficient hardware and platforms (such as cloud computing and Hadoop), predictive analytics tools are experiencing a boom.Data and business objectives aren't the only factors to consider when you're selecting an algorithm. The expertise of your data scientists is of tremendous value at this point; picking an algorithm that will get the job done is often a tricky combination of science and art. The art part comes from experience and proficiency in the business domain, which also plays a critical role in identifying a model that can serve business objectives accurately.