Press Ctrl+A to highlight all the data, and press Ctrl+C to copy to the clipboard. Then use the read.csv()
function to read the data into R:
cdNOW <- read.csv("clipboard", header=FALSE, sep = "")
Here’s how to name the columns:
colnames(cdNOW) <- c("CustomerID","InvoiceDate","Quantity","Amount")
The data should look like this:
> head(cdNOW) CustomerID InvoiceDate Quantity Amount 1 1 19970101 1 11.77 2 2 19970112 1 12.00 3 2 19970112 5 77.00 4 3 19970102 2 20.76 5 3 19970330 2 20.76 6 3 19970402 2 19.54It’s less complicated than the Online Retail project because
Amount
is the total amount of the transaction. So each row is a transaction, and aggregation is not necessary. The Quantity
column is irrelevant for our purposes.
Here’s a hint about reformatting the InvoiceDate:
The easiest way to get it into R date format is to download and install the lubridate
package and use its ymd()
function:
cdNOW$InvoiceDate <-ymd(cdNOW$InvoiceDate)
After that change, here’s how the first six rows look:
> head(cdNOW) CustomerID InvoiceDate Quantity Amount 1 1 1997-01-01 1 11.77 2 2 1997-01-12 1 12.00 3 2 1997-01-12 5 77.00 4 3 1997-01-02 2 20.76 5 3 1997-03-30 2 20.76 6 3 1997-04-02 2 19.54Almost there. What’s missing for
findRFM()
? An invoice number. So you have to use a little trick to make one up. The trick is to use each row identifier in the row-identifier column as the invoice number. To turn the row-identifier column into a data frame column, download and install the tibble
package and use its rownames_to_column()
function:cdNOW <- rownames_to_column(cdNOW, "InvoiceNumber")
Here’s the data:
> head(cdNOW) InvoiceNumber CustomerID InvoiceDate Quantity Amount 1 1 1 1997-01-01 1 11.77 2 2 2 1997-01-12 1 12.00 3 3 2 1997-01-12 5 77.00 4 4 3 1997-01-02 2 20.76 5 5 3 1997-03-30 2 20.76 6 6 3 1997-04-02 2 19.54Now create a data frame with everything but that
Quantity
column and you’re ready.See how much of the Online Retail project you can accomplish in this one.
Happy analyzing!