Home

Histograms in R with ggplot2

|
Updated:  
2017-07-04 2:24:06
|
Statistical Analysis with R Essentials For Dummies
Explore Book
Buy On Amazon
The Base R graphics toolset will get you started, but if you really want to shine at visualization, it's a good idea to learn ggplot2. In ggplot2 is an easy-to-learn structure for R graphics code. To learn that structure, make sure you have ggplot2 in the library so that you can follow what comes next. (Find ggplot2 on the Packages tab and click its check box.)

A graph starts with ggplot(), which takes two arguments. The first argument is the source of the data. The second argument maps the data components of interest into components of the graph. The function that does the job is aes().

To begin a histogram for Price in Cars93, the function is

> ggplot(Cars93, aes(x=Price)) The aes() function associates Price with the x-axis. In ggplot-world, this is called an aesthetic mapping. In fact, each argument to aes() is called an aesthetic.

This line of code draws the following figure, which is just a grid with a gray background and Price on the x-axis.

stats-r-ggplot() Applying ggplot() and nothing else.

Well, what about the y-axis? Does anything in the data map into it? No. That's because this is a histogram and nothing explicitly in the data provides a y-value for each x. So you can't say "y=" in aes(). Instead, you let R do the work to calculate the heights of the bars in the histogram.

And what about that histogram? How do you put it into this blank grid? You have to add something indicating that you want to plot a histogram and let R take care of the rest. What you add is a geom function ("geom" is short for "geometric object").

These geom functions come in a variety of types. ggplot2 supplies one for almost every graphing need, and provides the flexibility to work with special cases. To draw a histogram, the geom function to use is called geom_histogram().

How do you add geom_histogram() to ggplot()? With a plus sign:

ggplot(Cars93, aes(x=Price)) +

geom_histogram() This produces the following figure. The grammar rules tell ggplot2 that when the geometric object is a histogram, R does the necessary calculations on the data and produces the appropriate plot.

stats-r-ggplot-price The initial histogram for Price in Cars93.

At the bare minimum, ggplot2 graphics code has to have data, aesthetic mappings, and a geometric object. It's like answering a logical sequence of questions: What's the source of the data? What parts of the data are you interested in? Which parts of the data correspond to which parts of the graph? How do you want the graph to look?

Beyond those minimum requirements, you can modify the graph. Each bar is called a bin, and by default, ggplot() uses 30 of them. After plotting the histogram, ggplot() displays an onscreen message that advises experimenting with binwidth (which, unsurprisingly, specifies the width of each bin) to change the graph's appearance. Accordingly, you use binwidth = 5 as an argument in geom_histogram().

Additional arguments modify the way the bars look:

geom_histogram(binwidth=5, color = "black", fill = "white") With another function, labs(), you modify the labels for the axes and supply a title for the graph:

labs(x = "Price (x $1000)", y="Frequency",title="Prices of 93 Models of 1993 Cars")

Altogether now:

ggplot(Cars93, aes(x=Price)) + geom_histogram(binwidth=5,color="black",fill="white") + labs(x = "Price (x $1000)", y="Frequency", title="Prices of 93 Models of 1993 Cars") The result is the following figure.

stats-r-gglplot-histogram The finished Price histogram.

About This Article

This article is from the book: 

About the book author:

Joseph Schmuller, PhD, is a cognitive scientist and statistical analyst. He creates online learning tools and writes books on the technology of data science. His books include R All-in-One For Dummies and R Projects For Dummies.