Home

Choosing the Right Algorithm for Machine Learning

|
Updated:  
2016-07-18 1:41:04
|
TensorFlow For Dummies
Explore Book
Buy On Amazon
Machine learning involves the use of many different algorithms. This table gives you a quick summary of the strengths and weaknesses of various algorithms.
Algorithm Best at Pros Cons
Random Forest Apt at almost any machine learning problem
Bioinformatics
Can work in parallel
Seldom overfits
Automatically handles missing values
No need to transform any variable
No need to tweak parameters
Can be used by almost anyone with excellent results
Difficult to interpret
Weaker on regression when estimating values at the extremities of the distribution of response values
Biased in multiclass problems toward more frequent classes
Gradient Boosting Apt at almost any machine learning problem
Search engines (solving the problem of learning to rank)
It can approximate most nonlinear function
Best in class predictor
Automatically handles missing values
No need to transform any variable
It can overfit if run for too many iterations
Sensitive to noisy data and outliers
Doesn’t work well without parameter tuning
Linear regression Baseline predictions
Econometric predictions
Modelling marketing responses
Simple to understand and explain
It seldom overfits
Using L1 & L2 regularization is effective in feature selection
Fast to train
Easy to train on big data thanks to its stochastic version
You have to work hard to make it fit nonlinear functions
Can suffer from outliers
Support Vector Machines Character recognition
Image recognition
Text classification
Automatic nonlinear feature creation
Can approximate complex nonlinear functions
Difficult to interpret when applying nonlinear kernels
Suffers from too many examples, after 10,000 examples it starts taking too long to train
K-nearest Neighbors Computer vision
Multilabel tagging
Recommender systems
Spell checking problems
Fast, lazy training
Can naturally handle extreme multiclass problems (like tagging text)
Slow and cumbersome in the predicting phase
Can fail to predict correctly due to the curse of dimensionality
Adaboost Face detection Automatically handles missing values
No need to transform any variable
It doesn’t overfit easily
Few parameters to tweak
It can leverage many different weak-learners
Sensitive to noisy data and outliers
Never the best in class predictions
Naive Bayes Face recognition
Sentiment analysis
Spam detection
Text classification
Easy and fast to implement, doesn’t require too much memory and can be used for online learning
Easy to understand
Takes into account prior knowledge
Strong and unrealistic feature independence assumptions
Fails estimating rare occurrences
Suffers from irrelevant features
Neural Networks Image recognition
Language recognition and translation
Speech recognition
Vision recognition
Can approximate any nonlinear function
Robust to outliers
Works only with a portion of the examples (the support vectors)
Very difficult to set up
Difficult to tune because of too many parameters and you have also to decide the architecture of the network
Difficult to interpret
Easy to overfit
Logistic regression Ordering results by probability
Modelling marketing responses
Simple to understand and explain
It seldom overfits
Using L1 & L2 regularization is effective in feature selection
The best algorithm for predicting probabilities of an event
Fast to train
Easy to train on big data thanks to its stochastic version
You have to work hard to make it fit nonlinear functions
Can suffer from outliers
SVD Recommender systems Can restructure data in a meaningful way Difficult to understand why data has been restructured in a certain way
PCA Removing collinearity
Reducing dimensions of the dataset
Can reduce data dimensionality Implies strong linear assumptions (components are a weighted summations of features)
K-means Segmentation Fast in finding clusters
Can detect outliers in multiple dimensions
Suffers from multicollinearity
Clusters are spherical, can’t detect groups of other shape
Unstable solutions, depends on initialization

About This Article

This article is from the book: 

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.