The Gambler’s Z is an offshoot of the Z-test. No, the Z-test is not a pop quiz in math class that only Taylor Zalinsky and Andy Zimmerman had to take. It’s a test in the statistical sense, in which researchers evaluate data to see if the numbers validate or disprove a previously stated hypothesis, and to what degree.
The Z-test is when statisticians have two sets of data and they want to figure out whether there is a significant amount of difference between them. Say we wanted to test that antibiotic from the last paragraph: We’d take a bunch of petri dishes full of bacteria, apply the antibiotic to some of them, and then apply chocolate milk or something to the others, and count of the bacterial cells. The result would be two sets of data, and for each, we’d calculate the average and standard deviation. The Z-test is a way to look at those numbers and determine whether the difference we see in the average bacteria count is statistically significant or not. After centuries of people eyeballing data, putting their thumb in the air, or asking the gods if there was any difference in the two sets of data, the Z-test actually tells you once and for all that yes, chocolate milk is the perfect agent to fight bacteria.
Why would comparing two sets of data be useful? It all goes back to that essential question about whether the results we’re looking at are the result of random luck or not. Imagine you are presented with an NFL trend that says “In the last 20 years, teams that score fewer than 17 points three weeks in a row and then score 28 or more have gone under the total 23 times, and they have gone over the total 7 times. It only happens once or twice per season, but you’ll win 78 percent of the time.”
Before you start setting aside money to make this bet, you’ll want to ask yourself three questions:
- Does this trend make sense logically, and is there a reason why bettors or teams would behave in a way that makes this trend predictive? I’ll talk about how to answer that question in later chapters.
- Does the data I’m looking at represent a material difference from a completely random set of data?
- Related to question 2, if it’s not random data, how unusual is this result relative to a random result?
We know that if we did repeated trials of 30 coin flips, we’d get a bell curve centered on 15 heads. The Z-score will determine where on that bell curve that 23 heads trial is.
Doing the Gambler’s Z calculation
To get the Z-score, take these simple steps:- Get the standard deviation of a 30-trial coin flip for a 50-50 coin, which is the square root of 30 times the probability of heads times the probability of tails, or 30 × .5 × .5 = 7.5. The square root of 7.5 is 2.74.
- Take the difference of the result with the expected result. In this case, the expected result is 15 heads, and we got 23 heads. So, 23 – 15 = 8.
Note, what you care about is the distance between your result and the mean. The sign is not important.
- Divide the result in step 2 by the result in step 1 to get the Z-score, which is the number of standard deviations away from the expected result.
In this case, 8 / 2.74 = 2.9, or nearly three standard deviations.
A Z-score of 2.9 would indicate that your result is in the 95th to 99th percentile in terms of rarity. If you did 100 trials of 30 coins, you would only get a result that far away from 15 heads in only a two or three of the trials.If you calculate a Z-score of less than 2, you should treat it as a random result. Between 2 and 3 standard deviations is open to interpretation, so I’d look for more evidence that it’s a meaningful trend. And at 3 standard deviations and more, I consider it at least a statistically viable trend. That doesn’t mean it’s predictive of future results, but I feel comfortable that it’s not simply a random result.
Example: Random or not?
Look at the table below that shows the results of your friend flipping a coin 50 times. By the end, he’s come up with 25 heads and 25 tails. No big surprise there.H | H | T | H | T | T | T | T | T | H | H | H | T | H | H | H | T | H | T | H | T | H | H | H | T |
H | H | T | H | T | T | H | T | T | H | T | T | T | T | H | H | T | H | T | H | H | T | T | T | H |
Again, let’s go through the steps to find the Z-score:
- Take the difference of the result you got and the expected result (or the mean). In this case, we got 13 heads when we would have expected 9 (that is, in 18 attempts with a 50-50 coin), so that gives us 4.
- Now we divide that by the standard deviation of this data set. It just so happens the standard deviation for a coin flip is the square root of the number of trials (18) multiplied by the probabilities of each outcome (.5 for heads and .5 for tails). That gives us a standard deviation of 2.1 heads. In layman's terms, it would be perfectly normal to get between 6.9 and 11.1 heads on 18 flips (one standard deviation away from the mean), and hardly unexpected to get as few as 4.8 or as many as 13.2 heads (two standard deviations from the mean).
- Calculating 4 / 2.1 gives us a Z-score of 1.88.
If you find a trend with a Z of less than 1.96, the results are not unusual enough to say definitively that you’re not just looking at statistical noise. I structured this example on purpose because I wanted you to see the vulnerability of the Z-score to misinterpretation. As good as 13–5 seems in the moment, we can see in the context of all the flips that it’s meaningless; you’ve stumbled into a lucky streak and nothing more. It’s also entirely possible that you’re on to something, that you’ve identified a non-random phenomenon, but you’ll have to keep measuring to prove it statistically.