Home

Quick R Project: Understanding the Complexity Parameter

|
Updated:  
2018-04-10 12:56:34
|
From The Book:  
Statistical Analysis with R Essentials For Dummies
Explore Book
Buy On Amazon
Rattle is a terrific teaching tool for R programming. In this little two-part project, you can use Rattle to help wrap your brain around the complexity parameter (cp) and what it entails.

The default value of the cp is .01. To tell you how to calculate cp is beyond the scope of our discussion here. Just think of cp as the “minimum benefit” that a split must add to the tree. If the split doesn’t yield at least that much benefit (the value of cp), rpart() doesn’t add it.

What happens if you set cp to .00? You get no restrictions on what a split must add. Hence, you wind up with the most complex tree possible. So here’s the first part of this quick project: Set cp to .00 and Execute, and then use

library(rpart.plot) prp(crs$rpart, cex=1,varlen=0,branch=0)

to draw the tree. Evaluate this tree against the Testing set, and look at the overall error rate. Compared to the original error rate (6.9 percent), is the extra complexity worth adding?

The second part of this project is to move in the other direction. Set cp to a higher value, like .10. This makes it restrictive to add a split. Click Execute. Then draw the tree. It looks way less complex than with cp = .01, doesn’t it? Evaluate against the Testing set. How about that overall error rate?

On a live tree that grows outdoors in your garden, what do you call the process of cutting branches to make the tree look better? Does pruning sound familiar? That’s also the name for eliminating splits to make a decision tree less complex (which is what increasing the cp does).

About This Article

This article is from the book: 

About the book author:

Joseph Schmuller, PhD, is a cognitive scientist and statistical analyst. He creates online learning tools and writes books on the technology of data science. His books include R All-in-One For Dummies and R Projects For Dummies.