Home

The Role of Statistics in Machine Learning

|
Updated:  
2016-10-04 19:41:52
|
TensorFlow For Dummies
Explore Book
Buy On Amazon
Some sites online would have you believe that statistics and machine learning are two completely different technologies. For example, when you read Statistics vs. Machine Learning, fight!, you get the idea that the two technologies are not only different, but downright hostile toward each other. The fact is that statistics and machine learning have a lot in common and that statistics represents one of the five tribes (schools of thought) that make machine learning feasible. The five tribes are
  • Symbolists: The origin of this tribe is in logic and philosophy. This group relies on inverse deduction to solve problems.
  • Connectionists: The origin of this tribe is in neuroscience. This group relies on backpropagation to solve problems.
  • Evolutionaries: The origin of this tribe is in evolutionary biology. This group relies on genetic programming to solve problems.
  • Bayesians: This origin of this tribe is in statistics. This group relies on probabilistic inference to solve problems.
  • Analogizers: The origin of this tribe is in psychology. This group relies on kernel machines to solve problems.
The ultimate goal of machine learning is to combine the technologies and strategies embraced by the five tribes to create a single algorithm (the master algorithm) that can learn anything. Of course, achieving that goal is a long way off. Even so, scientists such as Pedro Domingos are currently working toward that goal.

Using the Bayesian tribe strategy, you solve most problems using some form of statistical analysis. You do see strategies embraced by other tribes described, but the main reason you begin with statistics is that the technology is already well established and understood. In fact, many elements of statistics qualify more as engineering (in which theories are implemented) than science (in which theories are created). Understanding the role of algorithms in machine learning is essential to defining how machine learning works.

About This Article

This article is from the book: 

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.