Base R Statistical Functions

Statistical Analysis with R Essentials For Dummies

Here's a selection of statistical functions that come with the standard R installation. You'll find many others in R packages.

**Central Tendency and Variability**
Function	What it Calculates
mean(x)	Mean of the numbers in vector x.
median(x)	Median of the numbers in vector x
var(x)	Estimated variance of the population from which the numbers in vector x are sampled
sd(x)	Estimated standard deviation of the population from which the numbers in vector x are sampled
scale(x)	Standard scores (z-scores) for the numbers in vector x

Relative Standing

Function

What it Calculates

sort(x)

The numbers in vector x in increasing order

sort(x)[n]

The nth smallest number in vector x

rank(x)

Ranks of the numbers (in increasing order) in vector x

rank(-x)

Ranks of the numbers (in decreasing order) in vector x

rank(x, ties.method= "average")

Ranks of the numbers (in increasing order) in vector x, with tied numbers given the average of the ranks that the ties would have attained

rank(x, ties.method= "min")

Ranks of the numbers (in increasing order) in vector x, with tied numbers given the minimum of the ranks that the ties would have attained

rank(x, ties.method = "max")

Ranks of the numbers (in increasing order) in vector x, with tied numbers given the maximum of the ranks that the ties would have attained

quantile(x)

The 0^th, 25^th, 50^th, 75^th, and 100^th percentiles (i.e, the quartiles) of the numbers in vector x. (That's not a misprint: quantile(x) returns the quartiles of x.)

**t-tests**
Function	What it Calculates
t.test(x,mu=n, alternative = "two.sided")	Two-tailed t-test that the mean of the numbers in vector x is different from n.
t.test(x,mu=n, alternative = "greater")	One-tailed t-test that the mean of the numbers in vector x is greater than n.
t.test(x,mu=n, alternative = "less")	One-tailed t-test that the mean of the numbers in vector x is less than n.
t.test(x,y,mu=0, var.equal = TRUE, alternative = "two.sided")	Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The variances in the two vectors are assumed to be equal.
t.test(x,y,mu=0, alternative = "two.sided", paired = TRUE)	Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The vectors represent matched samples.

**Analysis of Variance (ANOVA)**
Function	What it Calculates
aov(y~x, data = d)	Single-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vector x as the levels of the independent variable. The data are in data frame d.
aov(y~x + Error(w/x), data = d)	Repeated Measures ANOVA, with the numbers in vector y as the dependent variable and the elements in vector x as the levels of an independent variable. Error(w/x) indicates that each element in vector w experiences all the levels of x (i.e., x is a repeated measure). The data are in data frame d.
aov(y~x*z, data = d)	Two-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vectors x and z as the levels of the two independent variables. The data are in data frame d.
aov(y~x*z + Error(w/z), data = d)	Mixed ANOVA, with the numbers in vector z as the dependent variable and the elements of vectors x and y as the levels of the two independent variables. Error(w/z) indicates that each element in vector w experiences all the levels of z (i.e., z is a repeated measure). The data are in data frame d.

**Correlation and Regression**
Function	What it Calculates
cor(x,y)	Correlation coefficient between the numbers in vector x and the numbers in vector y
cor.test(x,y)	Correlation coefficient between the numbers in vector x and the numbers in vector y, along with a t-test of the significance of the correlation coefficient.
lm(y~x, data = d)	Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable. Data are in data frame d.
coefficients(a)	Slope and intercept of linear regression model a.
confint(a)	Confidence intervals of the slope and intercept of linear regression model a
lm(y~x+z, data = d)	Multiple regression analysis with the numbers in vector y as the dependent variable and the numbers in vectors x and z as the independent variables. Data are in data frame d.

When you carry out an ANOVA or a regression analysis, store the analysis in a list. For example, a <- lm(y~x, data = d)

Then, to see the tabled results, use the summary() function:

summary(a)

About This Article

About the book author:

Joseph Schmuller, PhD, is a cognitive scientist and statistical analyst. He creates online learning tools and writes books on the technology of data science. His books include R All-in-One For Dummies and R Projects For Dummies.

This article can be found in the category:

R

Book & Article Categories

Book & Article Categories

Collections

Base R Statistical Functions

About This Article

About the book author:

This article can be found in the category:

Book & Article Categories

Book & Article Categories

Collections

Base R Statistical Functions

About This Article

This article is from the book:

About the book author:

This article can be found in the category: