Interacting with users with R functions
Here’s a selection of statistical functions that come with the standard R installation. You’ll find many others in R packages. R provides the shiny
package and the shinydashboard
package for developing interactive applications. Here are selected functions from these packages:
Function | What it Calculates |
mean(x) |
Mean of the numbers in vector x. |
median(x) |
Median of the numbers in vector x |
var(x) |
Estimated variance of the population from which the numbers in vector x are sampled |
sd(x) |
Estimated standard deviation of the population from which the numbers in vector x are sampled |
scale(x) |
Standard scores (z-scores) for the numbers in vector x |
Relative Standing
Function | What it Calculates |
sort(x) |
The numbers in vector x in increasing order |
sort(x)[n] |
The nth smallest number in vector x |
rank(x) |
Ranks of the numbers (in increasing order) in vector x |
rank(-x) |
Ranks of the numbers (in decreasing order) in vector x |
rank(x, ties.method= "average") |
Ranks of the numbers (in increasing order) in vector x, with tied numbers given the average of the ranks that the ties would have attained |
rank(x, ties.method= "min") |
Ranks of the numbers (in increasing order) in vector x, with tied numbers given the minimum of the ranks that the ties would have attained |
rank(x, ties.method = "max") |
Ranks of the numbers (in increasing order) in vector x, with tied numbers given the maximum of the ranks that the ties would have attained |
quantile(x) |
The 0th, 25th, 50th, 75th, and 100th percentiles (i.e, the quartiles) of the numbers in vector x. (That’s not a misprint: quantile(x) returns the quartiles of x.) |
T-tests
Function | What it Calculates |
t.test(x,mu=n, alternative = "two.sided") |
Two-tailed t-test that the mean of the numbers in vector x is different from n. |
t.test(x,mu=n, alternative = "greater") |
One-tailed t-test that the mean of the numbers in vector x is greater than n. |
t.test(x,mu=n, alternative = "less") |
One-tailed t-test that the mean of the numbers in vector x is less than n. |
t.test(x,y,mu=0, var.equal = TRUE, alternative = "two.sided") |
Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The variances in the two vectors are assumed to be equal. |
t.test(x,y,mu=0, alternative = "two.sided", paired = TRUE) |
Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The vectors represent matched samples. |
Analysis of Variance (ANOVA)
Function | What it Calculates |
aov(y~x, data = d) |
Single-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vector x as the levels of the independent variable. The data are in data frame d. |
aov(y~x + Error(w/x), data = d) |
Repeated Measures ANOVA, with the numbers in vector y as the dependent variable and the elements in vector x as the levels of an independent variable. Error(w/x) indicates that each element in vector w experiences all the levels of x (i.e., x is a repeated measure). The data are in data frame d. |
aov(y~x*z, data = d) |
Two-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vectors x and z as the levels of the two independent variables. The data are in data frame d. |
aov(y~x*z + Error(w/z), data = d) |
Mixed ANOVA, with the numbers in vector z as the dependent variable and the elements of vectors x and y as the levels of the two independent variables. Error(w/z) indicates that each element in vector w experiences all the levels of z (i.e., z is a repeated measure). The data are in data frame d. |
Correlation and Regression
Function | What it Calculates |
cor(x,y) |
Correlation coefficient between the numbers in vector x and the numbers in vector y |
cor.test(x,y) |
Correlation coefficient between the numbers in vector x and the numbers in vector y, along with a t-test of the significance of the correlation coefficient. |
lm(y~x, data = d) |
Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable. Data are in data frame d. |
coefficients(a) |
Slope and intercept of linear regression model a. |
confint(a) |
Confidence intervals of the slope and intercept of linear regression model a |
lm(y~x+z, data = d) |
Multiple regression analysis with the numbers in vector y as the dependent variable and the numbers in vectors x and z as the independent variables. Data are in data frame d. |
When you carry out an ANOVA or a regression analysis, store the analysis in a list.
For example, a <- lm(y~x, data = d)
.
Then, to see the tabled results, use the summary()
function:
summary(a)
Tackling machine learning with R
Machine Learning (ML) is a popular area. R provides a number of ML-related packages and functions. Here are some of them:
Machine Learning Packages and Functions
Package | Function | What it does |
rattle |
rattle() |
Opens the Rattle Graphic User Interface |
rpart |
rpart() |
Creates a decision tree |
rpart.plot |
prp() |
Draws a decision tree |
randomForest |
randomForest() |
Creates a random forest of decision trees |
rattle |
printRandomForests() |
Prints the rules of a forest’s individual decision trees |
e1071 |
svm() |
Trains a support vector machine |
e1071 |
predict() |
Creates a vector of predicted classifications based on a support vector machine |
kernlab |
ksvm() |
Trains a support vector machine |
base R |
kmeans() |
Creates a k-means clustering analysis |
nnet |
nnet() |
Creates a neural network with one hidden layer |
NeuralNetTools |
plotnet() |
Draws a neural network |
nnet |
predict() |
Creates a vector of predictions based on a neural network |
Working with large(ish) databases in R
Created for statistical analysis, R has a wide array of packages and functions for dealing with large amounts of data. This selection is the tip of the iceberg’s tip:
Packages and Functions for Exploring Databases
Package | Function | What it does |
didrooRFM |
findRFM() |
Performs a Recency, Frequency, Money analysis on a database of retail transactions |
vcd |
assocstats() |
Calculates statistics for tables of categorical data |
vcd |
assoc() |
Creates a graphic that shows deviations from independence in a table of categorical data |
tidyverse |
glimpse() |
Provides a partial view of a data frame with the columns appearing onscreen as rows |
plotrix |
std.error() |
Calculates the standard error of the mean |
plyr |
inner_join() |
Joins data frames |
lubridate |
wday() |
Returns day of the week of a calendar date |
lubridate |
ymd() |
Returns a date in R date-format |
Manipulating maps and images with R
Here are some packages and functions to help you get started using R to draw maps and to process images.
Packages and Functions for Plotting Maps and for Processing Images
Package | Function | What it does |
maps |
map_data() |
Returns a data frame of latitudes and longitudes |
ggmaps |
geocode() |
Returns latitude and longitude of a place-name |
magick |
image_read() |
Reads an image into R and turns it into a magick object |
magick |
image_resize() |
Resizes an image |
magick |
image_rotate() |
Rotates an image |
magick |
image_flip() |
Rotates an image on a horizontal axis |
magick |
image_flop() |
Rotates an image on a vertical axis |
magick |
image_annotate() |
Adds text to an image |
magick |
image_background() |
Sets the background for an image |
magick |
image_composite() |
Combines images |
magick |
image_morph() |
Makes one image appear to gradually become (morph into) another |
magick |
image_animate() |
Puts an animation into the RStudio Viewer window |
magick |
image_apply() |
Applies a function to every frame in an animated GIF |
magick |
image_write() |
Saves an animation as a reusable GIF |