Home

How to Extract Variables, Observations, and Values from a Data Frame in R

|
|  Updated:  
2016-03-26 07:30:36
|   From The Book:  
Statistical Analysis with R Essentials For Dummies
Explore Book
Buy On Amazon

In many cases, you can extract values from a data frame in R by pretending that it’s a matrix. But although data frames may look like matrices, they definitely are not. Unlike matrices and arrays, data frames are not internally stored as vectors but as lists of vectors.

Pretending it’s a matrix

If you want to extract values from a data frame, you can just pretend it’s a matrix and start from there. You can use index numbers, names, or logical vectors for selection, like you would with matrices. For example, you can get the number of baskets scored by Geraldine in the third game like this:

> baskets.df[“3rd”, “Geraldine”]
[1] 2

Likewise, you can get all the baskets that Granny scored using the column index, like this:

> baskets.df[, 1]
[1] 12 4 5 6 9 3

Or, if you want this to be a data frame, you can use the argument drop=FALSE exactly as you do with matrices:

> str(baskets.df[, 1, drop = FALSE])
‘data.frame’: 6 obs. of 1 variable:
 $ Granny: num 12 4 5 6 9 3

Note that, unlike with matrices, the row names are dropped if you don’t specify the drop=FALSE argument.

Putting your dollar where your data is

As a careful reader, you noticed already that every variable is preceded by a dollar sign ($) in the output from str(). R isn’t necessarily pimping your data here — the dollar sign is simply a specific way for accessing variables. To access the variable Granny, you can use the dollar sign like this:

> baskets.df$Granny
[1] 12 4 5 6 9 3

So you specify the data frame, followed by a dollar sign and then the name of the variable. You don’t have to surround the variable name by quotation marks (as you would when you use the indices). R will return a vector with all the values contained in that variable. Note again that the row names are dropped here.

With this dollar-sign method, you can access only one variable at a time. If you want to access multiple variables at once using their names, you need to use the square brackets.

About This Article

This article is from the book: 

About the book author:

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R.

Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.