Articles From John C. Pezzullo
Filter Results
Cheat Sheet / Updated 07-27-2024
To estimate sample size in biostatistics, you must state the effect size of importance, or the effect size worth knowing about. If the true effect size is less than the “important” size, you don’t care if the test comes out nonsignificant. With a few shortcuts, you can pick an important effect size and find out how many participants you need, based on that effect size, for several common statistical tests. All the graphs, tables, and rules of thumb here are for 80 percent power and α = 0.05. In other words, the guidance applies to calculating sample size you need in order to have an 80 percent chance of getting a p value that’s less than or equal to 0.05. If you want sample sizes for other values of power and α, use these simple scale-up rules: For 90 percent power instead of 80 percent: Increase N by a third (multiply N by 1.33). For α = 0.01 instead of 0.05: Increase N by a half (multiply N by 1.5). For 90 percent power and α = 0.01: Double N (multiply N by 2).
View Cheat SheetArticle / Updated 03-26-2016
Biostatistics, in its present form, is the cumulative result of four centuries of contributions from many mathematicians and scientists. Some are well known, and some are obscure; some are famous people you never would’ve suspected of being statisticians, and some are downright eccentric and unsavory characters. This list gives you some highlights of the contributions of a few of the many people who made statistics (and therefore biostatistics) what it is today. Thomas Bayes (ca. 1701–1761). A Presbyterian minister and amateur mathematician, Bayes lived long before the field of statistics (as we know it) existed; people were still struggling to work out the basic laws of probability. Bayes dabbled with the “inverse probability” problem (figuring out what a population must be like, based on observing a sample from that population), but he never bothered to publish his work. Nevertheless, a formula he developed eventually became the foundation for Bayesian statistics — one of the two major branches of statistical theory (the other being frequentist statistics). Bayesian statistics wasn’t used to solve real-world problems until more than two centuries after the death of its creator. For more information, check out Thomas Bayes in Encyclopedia.com and Thomas Bayes in Wikipedia.org. Pierre-Simon LaPlace (1749–1827). While LaPlace did most of his work in astronomy (one of his driving ambitions was to prove that the solar system wouldn’t fly apart), he also made fundamental discoveries in mathematics. He helped put Bayesian statistics on a firm theoretical foundation, and he helped formulate the least-squares criterion for estimating population parameters. He was also one of the first scientists to suggest the existence of black holes due to gravitational collapse (and you thought that was a “modern” concept)! For more information, check out Pierre-Simon LaPlace in Encyclopedia.com and Wikipedia.org. Carl Friedrich Gauss (1777–1855). Sometimes called “the Prince of Mathematicians,” Gauss’s contributions ranged from the most abstract and theoretical to the most practical. He developed nonlinear least-squares regression, found efficient ways to solve simultaneous equations, and discovered what’s now called the “fast Fourier transform” (FFT), without which the creation of CAT scans and MRI images would be hopelessly time-consuming. The normal distribution, with its bell-shaped curve, is often called a Gaussian distribution in his honor. For more information, check out Encyclopedia.com and Wikipedia.org coverage of Carl Friedrich Gauss. John Snow (1813–1858). A London physician, Snow was investigating a cholera outbreak and noticed that all the victims had been using a recently dug, public water pump, located three feet from an old, leaking cesspool. After he convinced skeptical local officials to remove the pump handle, the epidemic quickly petered out (after which, the officials promptly reinstalled the handle). Snow’s study marks the birth of the science of epidemiology (closely related to biostatistics), which studies the patterns, causes, and effects of health and disease conditions in specific populations. Snow also played a major role in popularizing the use of anesthesia in surgical and obstetrical procedures (helped by his giving chloroform to Queen Victoria during the deliveries of the last two of her nine children). For more information, check out Encyclopedia.com and Wikipedia.org. Florence Nightingale (1820–1910). Who would think that the famous “Lady with the Lamp” from the Crimean War, the founder of professional nursing, was also a statistician! But she was. She could convey complicated ideas in simple English and summarize data with easily understood graphs, including a special kind of pie chart she invented, called a polar area diagram. With the help of graphics that even politicians could understand, she was able to bring about profound improvements in medical care and public health. For more information, check out Encyclopedia.com and Wikipedia.org. Karl Pearson (1857–1936). The “founder of mathematical statistics” was an interesting character, to say the least — anti-Semitic, socialist, and an ardent eugenicist whose extreme views were part of the philosophical underpinnings of the Third Reich’s holocaust. But his influence on the development of statistics was enormous — including the concept of statistical hypothesis testing, the correlation coefficient, the chi-square test, the p value, and factor analysis (to mention only a few), all of which he developed to further the scientific credibility of his outlandish views. For more information, check out Encyclopedia.com and Wikipedia.org. William S. Gosset (1876–1937). Gosset worked for the Guinness Brewery in Dublin, where he encountered the problem of comparing the means of small samples. With some help from Karl Pearson, Gosset came up with the correct solution. Not being a high-powered mathematician, he relied on brilliant intuition to come up with a guess at the answer, which he then confirmed by painstaking and time-consuming simulations conducted entirely by hand (computers hadn’t been invented yet). Guinness wouldn’t let him publish his results under his real name; they made him use the pen name “Student” instead, forever depriving him of the name recognition he truly deserved. What everyone calls the Student t test and the Student t distribution should really have been the Gosset t test and the Gosset t distribution. A pity indeed. For more information, check out Encyclopedia.com and Wikipedia.org. Ronald A. Fisher (1890–1962). Perhaps the most towering figure in the development of statistical techniques in use today, Fisher invented the analysis of variance and the Fisher exact test for analyzing cross-tabulated data (the chi-square test was only approximate). Like Karl Pearson, Fisher was a rabid eugenicist and racist and (in retrospect) was on the wrong side of other important issues — he argued against the idea that smoking caused lung cancer. And his opposition to Bayesian statistics may be partly responsible for the subordinate role of Bayesian methods during most of the 20th century. For more information, check out Encyclopedia.com and Wikipedia.org. John W. Tukey (1915–2000). A pioneer in promoting exploratory data analysis (carefully examining what the data’s trying to say before jumping into formal statistical testing), Tukey invented the box-and-whiskers plot and the stem-and-leaf plot as aids to visualizing how a set of numbers are distributed. He also developed one of the best so-called post-hoc tests to determine which pairs of groups of numbers are significantly different from which others. A true computer scientist, he coined the term bit as a nickname for “binary digit” and was either the first or second person to use the term software in print. For more information, check out Wikipedia.org. David R. Cox (1924–). A very productive, “modern” statistician, Cox made pioneering contributions to many areas of statistics, including the design of experiments. He’s most famous for developing a way to apply regression analysis to survival data when the general shape of the survival curve can’t be represented by a mathematical formula. His original paper describing this proportional-hazards model (now usually referred to simply as Cox regression) is one of the most often-cited articles in all medical literature. For more information, check out Wikipedia.org.
View ArticleArticle / Updated 03-26-2016
One of the reasons (but not the only reason) for running a multiple regression analysis is to come up with a prediction formula for some outcome variable, based on a set of available predictor variables. Ideally, you’d like this formula to be parsimonious — to have as few variables as possible, but still make good predictions. So, how do you select, from among a big bunch of predictor variables, the smallest subset needed to make a good prediction model? This is called the “model building” problem, which is a topic of active research by theoretical statisticians. No single method has emerged as the best way to select which variables to include. Unfortunately, researchers often use informal methods that seem reasonable but really aren’t very good, such as the following: Do a big multiple regression using all available predictors, and then drop the ones that didn’t come out significant. This approach may miss some important predictors because of collinearity. Run univariate regressions on every possible predictor individually, and then select only those predictors that were significant (or nearly significant) on the univariate tests. But sometimes a truly important predictor variable isn’t significantly associated with the outcome when tested by itself, but only when the effects of some other variable have been compensated for. This problem is the reverse of the disappearing significance problem — it’s not nearly as common, but it can happen. There is another way — many statistics packages offer stepwise regression, in which you provide all the available predictor variables, and the program then goes through a process similar to what a human (with a logical mind and a lot of time on his hands) might do to identify the best subset of those predictors. The program very systematically tries adding and removing the various predictors from the model, one at a time, looking to see which predictors, when added to a model, substantially improve its predictive ability, or when removed from the model, make it substantially worse. Stepwise regression can utilize several different algorithms, and models can be judged to be better or worse by several different criteria. In general, these methods often do a decent job of the following: Detecting and dropping variables that aren’t associated with the outcome, either in univariate or multiple regression Detecting and dropping redundant variables (predictors that are strongly associated with even better predictors of the outcome) Detecting and including variables that may not have been significant in univariate regression but that are significant when you adjust for the effects of other variables Most stepwise regression software also lets you “force” certain variables into the model, if you know (from physiological evidence) that these variables are important predictors of the outcome.
View ArticleArticle / Updated 03-26-2016
Two quite different ideas about probability have coexisted for more than a century. These probability approaches, which differ in several important ways, are as follows: The frequentist view defines probability of some event in terms of the relative frequency with which the event tends to occur. The Bayesian view defines probability in more subjective terms — as a measure of the strength of your belief regarding the true situation. (A less subjective formulation of Bayesian philosophy still assigns probabilities to the “population parameters” that define the true situation.) Most statistical problems can be solved using either frequentist or Bayesian techniques, but the frequentist approach is much more widely used, and most of the statistical techniques in use today are based on the frequentist view of probability. This predominance is because the frequentist approach usually involves simpler calculations. Only recently have sufficiently powerful computers and sufficiently sophisticated software become available to allow real-world problems to be tackled within the Bayesian framework. Here's how the frequentist and Bayesian views differ significantly: Ways of reasoning: These two philosophies of probability apply different directions of reasoning. Frequentists think deductively: “If the true population looks like this, then my sample might look like this.” Bayesians think inductively: “My sample came out like this, so the true situation might be this.” Ideas about what’s random: The two philosophies have different views of what is random. To the frequentist, the population parameters are fixed (but unknown), and the observed data is random, with sampling distributions that give the probabilities of observing various outcomes based on the values of certain population parameters. But in the Bayesian view, the observed data is fixed (after all, we know what we saw); it’s the population parameters that are random and have probability distribution functions associated with them based on the observed outcomes. Terminology: Frequentists and Bayesians use different terminology. Frequentists never talk about the probability that a statement is true or the probability that the true value lies within some interval. And Bayesians never use terms like p value, significant, null hypothesis, or confidence interval, which sound so familiar to those statisticians brought up in the frequentist tradition; instead, they use strange terms like prior probability, noninformative priors, and credible intervals. Usable information: Frequentists typically think of data from each experiment as a self-contained bundle of information, and they draw conclusions strictly from what’s in that set of data. Bayesians have a broader view of “usable information” — they typically start with some prior probabilities (preexisting beliefs about what the truth might be, perhaps based on previous experiments) and then blend in the results of their latest experiment to revise those probabilities (that is, to update their spread of belief about the true situation). These revised probabilities may become the prior probabilities in the analysis of their next experiment.
View ArticleArticle / Updated 03-26-2016
You can calculate the standard error (SE) and confidence interval (CI) of the more common sample statistics (means, proportions, event counts and rates, and regression coefficients). But an SE and CI exist (theoretically, at least) for any number you could possibly wring from your data — medians, centiles, correlation coefficients, and other quantities that might involve complicated calculations, like the area under a concentration-versus-time curve (AUC) or the estimated five-year survival probability derived from a survival analysis. Formulas for the SE and CI around these numbers might not be available or might be hopelessly difficult to evaluate. Also, the formulas that do exist might apply only to normally distributed numbers, and you might not be sure what kind of distribution your data follows. Consider a very simple problem. Suppose you’ve measured the IQ of 20 subjects and have gotten the following results: 61, 88, 89, 89, 90, 92, 93, 94, 98, 98, 101, 102, 105, 108, 109, 113, 114, 115, 120, and 138. These numbers have a mean of 100.85 and a median of 99.5. Because you’re a good scientist, you know that whenever you report some number you’ve calculated from your data (like a mean or median), you’ll also want to indicate the precision of that value in the form of an SE and CI. For the mean, and if you can assume that the IQ values are approximately normally distributed, things are pretty simple. You can calculate the SE of the mean as 3.54 and the 95% CI around the mean as 93.4 to 108.3. But what about the SE and CI for the median, for which there are no simple formulas? And what if you can’t be sure those IQ values come from a normal distribution? Then the simple formulas might not be reliable. Fortunately, there is a very general method for estimating SEs and CIs for anything you can calculate from your data, and it doesn’t require any assumptions about how your numbers are distributed. The SE of any sample statistic is the standard deviation (SD) of the sampling distribution for that statistic. And the 95% confidence limits of a sample statistic are well approximated by the 2.5th and 97.5th centiles of the sampling distribution of that statistic. So if you could replicate your entire experiment many thousands times (using a different sample of subjects each time), and each time calculate and save the value of the thing you’re interested in (median, AUC, or whatever), this collection of thousands of values would be a very good approximation to the sampling distribution of the quantity of interest. Then you could estimate the SE simply as the SD of the sampling distribution and the confidence limits from the centiles of the distribution. But actually carrying out this scenario isn’t feasible — you probably don’t have the time, patience, or money to perform your entire study thousands of times. Fortunately, you don’t have to repeat the study thousands of times to get an estimate of the sampling distribution. You can do it by reusing the data from your one actual study, over and over again! This may sound too good to be true, and statisticians were very skeptical of this method when it was first proposed. They called it bootstrapping, comparing it to the impossible task of “picking yourself up by your bootstraps.” But it turns out that if you keep reusing the same data in a certain way, this method actually works. Over the years, the bootstrap procedure has become an accepted way to get reliable estimates of SEs and CIs for almost anything you can calculate from your data; in fact, it’s often considered to be the “gold standard” against which various approximation formulas for SEs and CIs are judged. To see how the bootstrap method works, here’s how you would use it to estimate the SE and 95% CI of the mean and the median of the 20 IQ values shown earlier. You have to resample your 20 numbers, over and over again, in the following way: Write each of your measurements on a separate slip of paper and put them all into a bag. In this example, you write the 20 measured IQs on separate slips. Reach in and draw out one slip, write that number down, and put the slip back into the bag. (That last part is very important!) Repeat Step 2 as many times as needed to match the number of measurements you have, returning the slip to the bag each time. This is called resampling with replacement, and it produces a resampled data set. In this example, you repeat Step 2 19 more times, for a total of 20 times (which is the number of IQ measurements you have). Calculate the desired sample statistic of the resampled numbers from Steps 2 and 3, and record that number. In this example, you find the mean and the median of the 20 resampled numbers. Repeat Steps 2 through 4 many thousands of times. Each time, you generate a new resampled data set from which you calculate and record the desired sample statistics (in this case the mean and median of the resampled data set). You wind up with thousands of values for the mean and thousands of values for the median. In each resampled data set, some of the original values may occur more than once, and some may not be present at all. Almost every resampled data set will be different from all the others. The bootstrap method is based on the fact that these mean and median values from the thousands of resampled data sets comprise a good estimate of the sampling distribution for the mean and median. Collectively, they resemble the kind of results you may have gotten if you had repeated your actual study over and over again. Calculate the standard deviation of your thousands of values of the sample statistic. This process gives you a “bootstrapped” estimate of the SE of the sample statistic. In this example, you calculate the SD of the thousands of means to get the SE of the mean, and you calculate the SD of the thousands of medians to get the SE of the median. Obtain the 2.5th and 97.5th centiles of the thousands of values of the sample statistic. You do this by sorting your thousands of values of the sample statistic into numerical order, and then chopping off the lowest 2.5 percent and the highest 2.5 percent of the sorted set of numbers. The smallest and largest values that remain are the bootstrapped estimate of low and high 95% confidence limits for the sample statistic. In this example, the 2.5th and 97.5th centiles of the means and medians of the thousands of resampled data sets are the 95% confidence limits for the mean and median, respectively. Obviously you’d never try to do this bootstrapping process by hand, but it’s quite easy to do with software like the free Statistics101 program. You can enter your observed results and tell it to generate, say, 100,000 resampled data sets, calculate and save the mean and the median from each one, and then calculate the SD and the 2.5th and 97.5th centiles of those 100,000 means and 100,000 medians. Here are a few results from a bootstrap analysis performed on this data: Actual Data: 61, 88, 89, 89, 90, 92, 93, 94, 98, 98, 101, 102, 105, 108, 109, 113, 114, 115, 120, and 138. Mean = 100.85; Median = 99.5 Resampled Data Set #1: 61, 88, 88, 89, 89, 90, 92, 93, 98, 102, 105, 105, 105, 109, 109, 109, 109, 114, 114, and 120. Mean1 = 99.45, Median1 = 103.50 Resampled Data Set #2: 61, 88, 89, 89, 90, 92, 92, 98, 98, 98, 102, 105, 105, 108, 108, 113, 113, 113, 114, and 138. Mean2 = 100.7, Median2 = 100.0 (Between Set #2 and the following set, 99,996 more bootstrapped data sets were generated.) Resampled Data Set #99,999: 61, 61, 88, 89, 92, 93, 93, 94, 98, 98, 98, 101, 102, 105, 109, 114, 115, 120, 120, and 138. Mean99,999 = 99.45, Median99,999 = 98.00 Resampled Data Set #100,000: 61, 61, 61, 88, 89, 89, 90, 93, 93, 94, 102, 105, 108, 109, 109, 114, 115, 115, 120, and 138. Mean100,000 = 97.7, Median100,000 = 98.0 Here’s a summary of the 100,000 resamples: The SD of the 100,000 means = 3.46; this is the bootstrapped SE of the mean (SEM). The SD of the 100,000 medians = 4.24; this is the bootstrapped SE of the median. The 2.5th and 97.5th centiles of the 100,000 means = 94.0 and 107.6; these are the bootstrapped 95% confidence limits for the mean. The 2.5th and 97.5th centiles of the 100,000 medians = 92.5 and 108.5; these are the bootstrapped 95% confidence limits for the median. So you would report your mean and median, along with their bootstrapped standard errors and 95% confidence interval this way: Mean = 100.85 ± 3.46 (94.0–107.6); Median = 99.5 ± 4.24 (92.5–108.5). You’ll notice that the SE is larger (and the CI is wider) for the median than for the mean. This is generally true for normally distributed data — the median has about 25% more variability than the mean. But for non-normally distributed data, the median is often more precise than the mean. You don’t need to use bootstrapping for something as simple as the SE or CI of a mean because there are simple formulas for that. But the bootstrap method can just as easily calculate the SE or CI for a median, a correlation coefficient, or a pharmacokinetic parameter like the AUC or elimination half-life of a drug, for which there are no simple SE or CI formulas and for which the normality assumptions might not apply. Bootstrapping is conceptually simple, but it’s not foolproof. The method involves certain assumptions and has certain limitations. For example, it’s probably not going to be very useful if you have only a few observed values. Check out Statistics 101 for more information on using the bootstrap method (and for the free Statistics101 software to do the bootstrap calculations very easily).
View ArticleArticle / Updated 03-26-2016
Modern statistical software makes it easy for you to analyze your data in most of the situations that you’re likely to encounter (summarize and graph your data, calculate confidence intervals, run common significance tests, do regression analysis, and so on). But occasionally you may run into a problem for which no preprogrammed solution exists. Deriving new statistical techniques can involve some very complicated mathematics, and usually only a professional theoretical statistician attempts to do so. But there’s a simple yet general and powerful way to get answers to a lot of statistical questions, even if you aren’t a math whiz. It’s called simulation, or the Monte-Carlo technique. Statistics is the study of random fluctuations, and most statistical problems really come down to the question “What are the random fluctuations doing?” Well, it turns out that computers are very good at drawing random numbers from a variety of distributions. With the right software, you can program a computer to make random fluctuations that embody the problem you’re trying to solve; then you can simply see what those fluctuations did. You can then rerun this process many times and summarize what happened in the long run. The simulation approach can be used to solve problems in probability theory, determine statistical significance in common or uncommon situations, calculate the power of a proposed study, and much more. Here’s a simple, if somewhat contrived, example of what simulation can do: What’s the chance that the product of the IQs of two randomly chosen people is greater than 12,000? IQs are normally distributed, with a mean of 100 and a standard deviation of 15. (And don’t ask why anyone would want to multiply two IQ scores together; it’s just an example!) As simple as this question may sound, it’s a very difficult problem to solve exactly, and you’d have to be an expert mathematician to even attempt it. But it’s very easy to get an answer by simulation. Just do this: Generate two random IQ numbers (normally distributed, m = 100, sd = 15). Multiply the two IQ numbers together. See whether the product is greater than 12,000. Repeat Steps 1–3 a million times and count how many times the product exceeds 12,000. Divide that count by a million, and you have your probability. This simulation can be set up using the free Statistics 101 program or even Excel. Using R software, the five steps can be programmed in a single line: sum(rnorm(1000000,100,15)*rnorm(1000000,100,15)>12000)/1000000 Even if you’re not familiar with R’s syntax, you can probably catch the drift of what this program is doing. Each “rnorm” function generates a million random IQ scores. The “*” multiplies them together pairwise. The “>” compares each of one million products to 12,000. The “sum” function adds up the number of times the comparison comes out true (true counts as 1; false counts as 0). The “/” divides the sum by a million. R prints out the results of “one-liner” programs like this one without your having to explicitly tell it to. When one person ran this program on his desktop computer, it computed for about a half-second and then printed the result: 0.172046. Then he ran it again, and it printed 0.172341. That’s a characteristic of simulation methods — they give slightly different results each time you run them. And the more simulations you run, the more accurate the results will be. That’s why the preceding steps ask for a million repetitions. You won’t get an exact answer, but you’ll know that the probability is around 0.172, which is close enough.
View ArticleArticle / Updated 03-26-2016
While many programs, apps, and web pages are available to perform power and sample-size calculations, they aren’t always easy or intuitive to use. Because spreadsheets like Excel are readily available and intuitive, it’s convenient to have a single spreadsheet that can perform power and sample-size calculations for the situations that arise most frequently in biological and clinical research. The Sample Size Calculations spreadsheet gives you a simple way to estimate the sample size you need when designing a study involving the following statistical analyses: Click here for two-group (unpaired) comparison of means by the Student t test. Click here for two-group (paired) comparison by the Student t test. Click here for comparison of two observed proportions by the chi-square or Fisher Exact test. Click here for comparison of one observed proportion versus a fixed proportion value. Click here for test for correlation coefficient = 0. Click here for one-way ANOVA for balanced (equal-sized) groups. Click here for comparison of two event rates. Click here for survival analysis by log-rank or Cox proportional-hazards regression. It’s important to realize that most of the calculations in this spreadsheet are only approximations. They’ll usually give sample-size answers that are within a few subjects of the exact answer, which should be adequate when you’re planning a study. But they shouldn’t be taken as the “official” answers. Before you use the calculator for any of the preceding calculations, Click here for a few basic instructions and concepts.
View ArticleArticle / Updated 03-26-2016
The proportion of subjects having some attribute (such as responding to treatment) can be compared between two groups of subjects by creating a cross-tab from the data, where the two rows represent the two groups, and the two columns represent the presence or absence of the attribute. In biostatistics, this cross-tab can be analyzed with a chi-square or Fisher Exact test. To estimate the required sample size, you need to provide the expected proportions in the two groups. Look up the two proportions you want to compare at the left and top of the following table. (It doesn’t matter which proportion you look up on which side.) The number in the cell of the table is the number of analyzable subjects you need in each group. (The total required sample size is twice this number.) For example, if you expect 40 percent of untreated subjects with a certain disease to die but only 30 percent of subjects treated with a new drug to die, you would find the cell at the intersection of the 0.30 row and the 0.40 column (or vice versa), which contains the number 376. So you need 376 analyzable subjects in each group, or 752 analyzable subjects altogether. Credit: Illustration by Wiley, Composition Services Graphics
View ArticleArticle / Updated 03-26-2016
In biostatistics, when comparing the means of two independent groups of subjects using an unpaired Student t test, the effect size is expressed as the ratio of Δ (delta, the difference between the means of two groups) divided by σ (sigma, the within-group standard deviation). Each chart in the following figure shows overlapping bell curves that indicate the amount of separation between two groups, along with the effect size (Δ/σ) and the required number of analyzable subjects in each group. Pick the chart that looks like an important amount of separation between the two groups. For example, if the middle chart (corresponding to a between-group difference that’s three-fourths as large as the within-group standard deviation) looks like an important amount of separation, then you need about 29 analyzable subjects per group (for a total of 58 analyzable subjects). Credit: Illustration by Wiley, Composition Services Graphics For other Δ/σ values, use this rule of thumb to estimate sample size: You need about 16/(Δ/σ)2 analyzable subjects in each group.
View ArticleArticle / Updated 03-26-2016
In biostatistics, when comparing paired measurements (such as changes between two time points for the same subject) using a paired Student t test, the effect size is expressed as the ratio of Δ (delta, the mean change) divided by σ (sigma, the standard deviation of the changes). Another, perhaps easier, way to express the effect size is by the relative number of expected subjects with positive versus negative changes. (These ratios are shown below each curve.) Each chart in the following figure shows a bell curve indicating the spread of changes, along with the effect size (Δ/σ), the ratio of positive to negative differences, and the required number of analyzable subjects (each subject providing a pair of measurements). Pick the chart that looks like an important amount of change (relative to the vertical line representing no change). For example, the middle chart corresponds to a mean change that is three-fourths as large as the standard deviation of the changes, with about 3.4 times as many subjects increasing as decreasing. If this looks like an important amount of change, then you need 16 pairs of measurements (such as 16 subjects, each with a pre-treatment and a post-treatment value). Credit: Illustration by Wiley, Composition Services Graphics For other Δ/σ values, use this rule of thumb to estimate sample size: You need about 8/(Δ/σ)2 + 2 pairs of measurements.
View Article