Home

R Project for RFM Analysis: Another Data Set

|
Updated:  
2018-04-11 15:01:44
|
Statistical Analysis with R Essentials For Dummies
Explore Book
Buy On Amazon
If you’re interested in trying out your RFM analysis skills on another set of data, this R project is for you. The CDNOW data set consists of almost 70,000 rows. It’s a record of sales at CDNOW from the beginning of January 1997 through the end of June 1998.

Press Ctrl+A to highlight all the data, and press Ctrl+C to copy to the clipboard. Then use the read.csv() function to read the data into R:

cdNOW <- read.csv("clipboard", header=FALSE, sep = "")

Here’s how to name the columns:

colnames(cdNOW) <- c("CustomerID","InvoiceDate","Quantity","Amount")

The data should look like this:

> head(cdNOW)
  CustomerID InvoiceDate Quantity Amount
1          1    19970101        1  11.77
2          2    19970112        1  12.00
3          2    19970112        5  77.00
4          3    19970102        2  20.76
5          3    19970330        2  20.76
6          3    19970402        2  19.54
It’s less complicated than the Online Retail project because Amount is the total amount of the transaction. So each row is a transaction, and aggregation is not necessary. The Quantity column is irrelevant for our purposes.

Here’s a hint about reformatting the InvoiceDate: The easiest way to get it into R date format is to download and install the lubridate package and use its ymd() function:

cdNOW$InvoiceDate <-ymd(cdNOW$InvoiceDate)

After that change, here’s how the first six rows look:

> head(cdNOW)
  CustomerID InvoiceDate Quantity Amount
1          1  1997-01-01        1  11.77
2          2  1997-01-12        1  12.00
3          2  1997-01-12        5  77.00
4          3  1997-01-02        2  20.76
5          3  1997-03-30        2  20.76
6          3  1997-04-02        2  19.54
Almost there. What’s missing for findRFM()? An invoice number. So you have to use a little trick to make one up. The trick is to use each row identifier in the row-identifier column as the invoice number. To turn the row-identifier column into a data frame column, download and install the tibble package and use its rownames_to_column() function:

cdNOW <- rownames_to_column(cdNOW, "InvoiceNumber")

Here’s the data:

> head(cdNOW)
  InvoiceNumber CustomerID InvoiceDate Quantity Amount
1             1          1  1997-01-01        1  11.77
2             2          2  1997-01-12        1  12.00
3             3          2  1997-01-12        5  77.00
4             4          3  1997-01-02        2  20.76
5             5          3  1997-03-30        2  20.76
6             6          3  1997-04-02        2  19.54
Now create a data frame with everything but that Quantity column and you’re ready.

See how much of the Online Retail project you can accomplish in this one.

Happy analyzing!

About This Article

This article is from the book: 

About the book author:

Joseph Schmuller, PhD, is a cognitive scientist and statistical analyst. He creates online learning tools and writes books on the technology of data science. His books include R All-in-One For Dummies and R Projects For Dummies.