https://www.wiley.com/Statistical+Analysis+with+R+For+Dummies%2C+2nd+Edition-p-9781394343065
|
Published:
June 9, 2025

Statistical Analysis with R For Dummies

Overview

Simplify stats and learn how to graph, analyze, and interpret data the easy way

Statistical Analysis with R For Dummies makes stats approachable by combining clear explanations with practical applications. You'll learn how to download and use R and RStudio—two free, open-source tools—to learn statistics concepts, create graphs, test hypotheses, and draw meaningful conclusions. Get started by learning the basics of statistics and R, calculate descriptive statistics, and use inferential statistics to test hypotheses. Then, visualize it all with graphs and charts. This Dummies guide is your well-marked path to sailing through statistics.

  • Get clear explanations of the basics of statistics and data analysis
  • Learn how to analyze and visualize data with R, step by step
  • Create charts, graphs, and summaries to interpret results
  • Explore hypothesis testing, and prediction techniques

This is the perfect introduction to R for students, professionals, and the stat-curious.

Read More

About The Author

Joseph Schmuller is a cognitive scientist and statistical analyst who creates online learning tools and books on data science. He is the author of R All-in One For Dummies, all five editions of Statistical Analysis with Excel For Dummies, Statistical Analysis with R For Dummies, and R Projects For Dummies, among others.

Sample Chapters

statistical analysis with r for dummies

CHEAT SHEET

R provides a wide array of functions to help you with statistical analysis with R—from simple statistics to complex analyses. Several statistical functions are built into R and R packages. R statistical functions fall into several categories including central tendency and variability, relative standing, t-tests, analysis of variance and regression analysis.

HAVE THIS BOOK?

Articles from
the book

One reason for the rapid rise of R is the supportive R community. It seems that as soon as someone becomes proficient in R, they immediately want to share their knowledge with others — and the web is the place to do it. This list points you to some of the helpful web-based resources the R community has created.
An important aspect of base R graphics is the ability to add features to a graph after you create it. One way of showing histogram information is to think of the data as probabilities rather than frequencies. So instead of the frequency of a particular price range, you graph the probability that a car selected from the data is in that price range.
Many times the dependent variable is a data point rather than a frequency. The table shows the data for commercial space revenues for the early 1990s. (The data, by the way, are from the U.S. Department of Commerce, via the Statistical Abstract of the U.S.)U.S. Commercial Space Revenues 1990–1994 (In Millions of Dollars) Industry 1990 1991 1992 1993 1994 Commercial Satellites Delivered 1,000 1,300 1,300 1,100 1,400 Satellite Services 800 1,200 1,500 1,850 2,330 Satellite Ground Equipment 860 1,300 1,400 1,600 1,970 Commercial Launches 570 380 450 465 580 Remote Sensing Data 155 190 210 250 300 The data are the numbers in the cells, which represent revenue in thousands of dollars.
Here's a selection of statistical functions that come with the standard R installation. You'll find many others in R packages. Central Tendency and Variability Function What it Calculates mean(x) Mean of the numbers in vector x. median(x) Median of the numbers in vector x var(x) Estimated variance of the population from which the numbers in vector x are sampled sd(x) Estimated standard deviation of the population from which the numbers in vector x are sampled scale(x) Standard scores (z-scores) for the numbers in vector x Relative Standing Function What it Calculates sort(x) The numbers in vector x in increasing order sort(x)[n] The nth smallest number in vector x rank(x) Ranks of the numbers (in increasing order) in vector x rank(-x) Ranks of the numbers (in decreasing order) in vector x rank(x, ties.
The normal distribution family is one of many distribution families baked into R. Dealing with these families is intuitive. Follow these guidelines: Begin with the distribution family’s name in R (norm for the normal family, for example). To the beginning of the family name, add d to work with the probability density function.
Dot charts are yet another way of visualizing data in the following table. Noted graphics honcho William Cleveland believes that people perceive values along a common scale (as in a bar plot) better than they perceive areas (as in a pie graph). So he came up with the dot chart, shown in the figure.Types and Frequencies of Cars in the Cars93 data frame Type Frequency Compact 16 Large 11 Midsize 22 Small 21 Sporty 14 Van 9 Dot chart for the data in the table.
Data often resides in long, complex tables. Often, you have to visualize only a portion of the table to find a pattern or a trend. A good example is the Cars93 data frame, which resides in the MASS package. This data frame holds data on 27 variables for 93 car models that were available in 1993.The figure shows part of the data frame in the Data Editor window that opens after you type> edit(Cars93) Part of the Cars93 data frame.
You can create a histogram from the Cars93 data frame, which resides in the MASS package. This data frame holds data on 27 variables for 93 car models that were available in 1993. The figure shows part of the data frame in the Data Editor window that opens after you type> edit(Cars93) To create a histogram of the distribution of prices in that data frame, you'd enter: hist(Cars93$Price) which produces the following figure.
The Base R graphics toolset will get you started, but if you really want to shine at visualization, it's a good idea to learn ggplot2. In ggplot2 is an easy-to-learn structure for R graphics code. To learn that structure, make sure you have ggplot2 in the library so that you can follow what comes next. (Find ggplot2 on the Packages tab and click its check box.
R is a computer language. It's a tool for doing the computation and number-crunching that set the stage for statistical analysis and decision-making. RStudio is an open source integrated development environment (IDE) for creating and running R code. It's available in versions for Windows, Mac, and Linux. Although you don't need an IDE in order to work with R, RStudio makes life a lot easier.
Visualizing a distribution often helps you understand it. The process can be a bit involved in R, but it's worth the effort. The figure shows three members of the t-distribution family on the same graph. The first has df = 3, the second has df = 10, and the third is the standard normal distribution (df = infinity).
The grammar-of-graphics approach takes considerably more effort when plotting the values of a t-distribution than base R. But follow along and you'll learn a lot about ggplot2.You start by putting the relevant numbers into a data frame:t.frame = data.frame(t.values, df3 = dt(t.values,3),df10 = dt(t.values,10), std_normal = dnorm(t.
Functions built into R. Each one consists of a function name immediately followed by parentheses, such as c(), sum(), mean(), and var(). Inside the parentheses are the arguments. In this context, "argument" doesn't mean "disagreement," "confrontation," or anything like that. It's just the math term for whatever a function operates on.
Perhaps the fundamental descriptive statistic is the number of scores in a set of data. length() is the R function that calculates this number. Work with the Cars93 data frame, which is in the MASS package. (Click the check box next to MASS on the Packages tab.)Cars93 holds data on 27 variables for 93 cars available in 1993.
Base R provides a nice way of visualizing relationships among more than two variables. If you add price into the mix and you want to show all the pairwise relationships among MPG-city, price, and horsepower, you'd need multiple scatter plots. R can plot them all together in a matrix, as the figure shows. Multiple scatter plots for the relationships among MPG-city, price, and horsepower.
After you calculate the variance of a set of numbers, you have a value whose units are different from your original measurements. For example, if your original measurements are in inches, their variance is in square inches. This is because you square the deviations before you average them. So the variance in the five-score population in the preceding example is 6.
The R function for calculating standard scores is called scale(). Supply a vector of scores, and scale() returns a vector of z-scores along with, helpfully, the mean and the standard deviation.To show scale() in action, isolate a subset of the Cars93 data frame. (It's in the MASS package. On the Packages tab, check the box next to MASS if it's unchecked.
R provides a wide array of functions to help you with statistical analysis with R—from simple statistics to complex analyses. Several statistical functions are built into R and R packages. R statistical functions fall into several categories including central tendency and variability, relative standing, t-tests, analysis of variance and regression analysis.
In statistics, moments are quantities that are related to the shape of a set of numbers. "Shape of a set of numbers," means "what a histogram based on the numbers looks like" — how spread out it is, how symmetric it is, and more. A raw moment of order k is the average of all numbers in the set, with each number raised to the kth power before you average it.
The empirical cumulative distribution function (ecdf) is closely related to cumulative frequency. Rather than show the frequency in an interval, however, the ecdf shows the proportion of scores that are less than or equal to each score.In base R, it's easy to plot the ecdf:plot(ecdf(Cars93$Price), xlab = "Price", ylab = "Fn(Price)") This produces the following figure.
You might think that the function chisq.test() would be the best way to test a variance in R. Although base R provides this function, it's not appropriate here. Statisticians use this function to test other kinds of hypotheses.Instead, turn to a function called varTest, which is in the EnvStats package. On the Packages tab, click Install.
Median is a fancy name for a simple concept: It's the middle value in a group of numbers. Arrange the numbers in order, and the median is the value below which half the scores fall and above which half the scores fall:> sort(reading.speeds) [1] 45 49 55 56 62 78 > sort(reading.speeds.new) [1] 45 49 55 56 62 180 In each case, the median is halfway between 55 and 56, or 55.
Base R does not provide a function for finding the mode. A measure of central tendency, the mode, is important. It's the score that occurs most frequently in a group of scores. Sometimes the mode is the best measure of central tendency to use.Imagine a small company that consists of 30 consultants and two high-ranking officers.
Working with the standard normal distribution in R couldn’t be easier. The only change you make to the four norm functions is to not specify a mean and a standard deviation — the defaults are 0 and 1.Here are some examples:> dnorm(0) [1] 0.3989423 > pnorm(0) [1] 0.5 > qnorm(c(.25,.50,.75)) [1] -0.6744898 0.
Calculating variance in R is simplicity itself. You use the var() function. But which variance does it give you? The one with N in the denominator or the one with N-1? Time to find out:heights <- c(50, 47, 52, 46, 45) > var(heights) [1] 8.5 It calculates the estimated variance (with N–1 in the denominator).
An R function called z.test() would be great for doing the kind of testing in which you use z-scores in the hypothesis test. One problem: That function does not exist in base R. Although you can find one in other packages, it's easy enough to create one and learn a bit about R programming in the process.The function will work like this:> IQ.
https://cdn.prod.website-files.com/6630d85d73068bc09c7c436c/69195ee32d5c606051d9f433_4.%20All%20For%20You.mp3

Frequently Asked Questions

No items found.