A data frame can be extended with new variables in R. You may, for example, get data from another player on Granny’s team. Or you may want to calculate a new variable from the other variables in the dataset, like the total sum of baskets made in each game.
Adding a single variable
There are three main ways of adding a variable. Similar to the case of adding observations, you can use either the cbind() function or the indices.
You also can use the dollar sign to add an extra variable. Imagine that Granny asked you to add the number of baskets of her friend Gabrielle to the data frame. First, you would create a vector with that data like this:
> baskets.of.Gabrielle <- c(11, 5, 6, 7, 3, 12, 4, 5, 9)
To create an extra variable named Gabrielle with that data, you simply do the following:
> baskets.df$Gabrielle <- baskets.of.Gabrielle
If you want to check whether this worked, but you don’t want to display the complete data frame, you could use the head() function. This function takes two arguments: the object you want to display and the number of rows you want to see. To see the first four rows of the new data frame, baskets.df, use the following code:
> head(baskets.df, 4) Granny Geraldine Gabrielle 1st 12 5 11 2nd 4 4 5 3rd 5 2 6 4th 6 4 7
Adding multiple variables using cbind
You can pretend your data frame is a matrix and use the cbind() function to do this. Unlike when you use rbind() on data frames, you don’t even need to worry about the row or column names. Let’s create a new data frame with the goals for Gertrude and Guinevere. To combine both into a data frame, try:
> new.df <- data.frame( + Gertrude = c(3, 5, 2, 1, NA, 3, 1, 1, 4), + Guinevere = c(6, 9, 7, 3, 3, 6, 2, 10, 6) + )
Although the row names of the data frames new.df and baskets.df differ, R will ignore this and just use the row names of the first data frame in the cbind() function, as you can see from the output of the following code:
> head(cbind(baskets.df, new.df), 4) Granny Geraldine Gabrielle Gertrude Guinevere 1st 12 5 11 3 6 2nd 4 4 5 5 9 3rd 5 2 6 2 7 4th 6 4 7 1 3
When using a data frame or a matrix with column names, R will use those as the names of the variables. If you use cbind() to add a vector to a data frame, R will use the vector’s name as a variable name unless you specify one yourself, as you did with rbind().
If you bind a matrix without column names to the data frame, R automatically uses the column numbers as names. That will cause a bit of trouble though, because plain numbers are invalid object names and, hence, more difficult to use as variable names. In this case, you’d better use the indices.
Whenever you want to use a data frame and don’t want to continuously have to type its name followed by $, you can use the functions with() and within(). With the within() function, you also can easily add variables to a data frame.