This gets into the area of statistical analysis, and at a fairly esoteric level. So let’s take a look at an R capability — the formula.
In this example, let’s say that Temperature depends on Month. Another way to say this is that Temperature is the dependent variable and Month is the independent variable.
An R formula incorporates these concepts and serves as the basis for many of R’s statistical functions and graphing functions. This is the basic structure of an R formula:
function(dependent_var ~ independent_var, data = data.frame)
Read the tilde operator (~
) as “depends on.”
Here’s how you can address the relationship between Temp
and Month
:
> analysis <- lm(Temp ~ Month, data=airquality)
The name of the function lm()
is an abbreviation for linear model. This means that you expect the temperature to increase linearly (at a constant rate) from month to month. To see the results of the analysis, you can use summary()
:
analysis, you can use summary(): > summary(analysis)TheCall: lm(formula = Temp ~ Month, data = airquality)
Residuals: Min 1Q Median 3Q Max -20.5263 -6.2752 0.9121 6.2865 17.9121
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.2112 3.5191 16.541 < 2e-16 *** Month 2.8128 0.4933 5.703 6.03e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.614 on 151 degrees of freedom Multiple R-squared: 0.1772, Adjusted R-squared: 0.1717 F-statistic: 32.52 on 1 and 151 DF, p-value: 6.026e-08
Estimate
for Month
indicates that temperature increases at a rate of 2.8128 degrees per month between May and September. Along with the Estimate
for (Intercept)
, you can summarize the relationship between Temp
and Month
as
Temp=58.2112+(2.8128×Month)
where Month is a number from 5 to 9.You might remember from algebra class that when you graph this kind of equation, you get a straight line — hence the term linear model. Is the linear model a good way to summarize these data? The numbers in the bottom line of the output say that it is, but I won’t go into the details.
The output of summary()
(and other statistical functions in R) is a list. So if you want to refer to the Estimate
for Month
, that’s
> s <- summary(analysis)
> s$coefficients[2,1]
[1] 2.812789