There is a lot of persistent misinformation in statistical analysis. But the one I run into most often of late is this:
Post hoc Statistical power analysis doesn’t matter. you have a p-value, so your ability to reject or accept the null hypothesis is already in.
And I’m just sitting here thinking, “What!?”
But, unfortunately, a lot of epidemiologists, toxicologists, pharmacologists, and risk assessors all agree with this idea that post hoc power analysis is irrelevant.
A companion to this idea I heard profs in grad school push was that so long as the p-value < 0.05 then power didn’t matter, because you can’t commit a false negative at this point.
Let me be clear — the idea that a study with a significant p-value cannot be a false positive is simply asinine and is not based on any actual statistical theory. And the idea that post hoc power analysis is unnecessary is also completely asanine.
Let’s explore this.
Breaking Statistical Power
When you perform a statistical analysis, what you want is to make an inference about the population. To do that we make a lot of assumptions. In frequentist stats (what most people do; i.e., not Bayesian stats) the key assumption is that the data you collect is an accurate representation of the population of interest. IF THAT IS NOT TRUE, THEN YOUR STATISTICAL INFERENCES BASED ON YOUR ANALYSIS WILL NOT BE USEFUL.
Ya know what else isn’t useful? A false positive or a false negative.
So What’s a False Positive?
A false positive in toxicology is when you say treatment is different from vehicle when treatment is in fact the same as vehicle.
So What’s a False Negative?
A false negative in toxicology is when you say treatment is no different from vehicle when treatment is in fact different from vehicle.
What on Earth is a Post Hoc Power Analysis???!!!
Post hoc means after the fact.
Power analysis is an analysis we perform to see what your statistical power is.
Statistical power is 1 – false negative rate. So if your false negative rate is 20%, then you have 80% statistical power.
So a post hoc power analysis is a power analysis you perform after you get a p-value.
Dude, I Have a p-Value, Why Do I Care About Power Now?!
Yeah, so, let’s talk about this. You’ve got a p-value. And let’s say that p-value is 0.01, so it’s less than your nominal threshold of 0.05. Before you pop that champagne, ask yourself a quick question: “is that p-value real, or is this likely a false positive?”
Now, some will say, “Too late, dude, I got my p-value, the jury returned a verdict, and it’s significant. It doesn’t matter how much power I have because I’m not committing a false negative.”
There is a wealth of literature that show that p-values only work well with large statistical power, and that at lower statistical power the p-value will be a false positive. Morris demonstrated this with a relatively simple example. Casella and Berger concur with Morris. This has also been shown by Gelman and Carlin, Christley has also demonstrated this, and it’s easy to demonstrate this through simple simulations.
The idea that you can have a small sample size, get a significant p-value, and then ignore statistical power defies logic!
Aside: Actually, Type S and Type M Errors Would Be Better
To be bluntly honest, for the same reasons mentioned above (false positives) it would be better to take a more Bayesian approach and calculate power based on a less biased estimate of the effect size (the difference between treatment and vehicle effects). This point was made by Gelman and Carlin.
And to be even more bluntly honest, you should actually be calculating the Type S and Type M errors.
But here’s why you should be going to the literature and trying to get a better estimate of your effect size.
It’s because your study’s estimate of the treatment and vehicle effects are going to be biased, unless your sample sizes are large.
Why Your Treatment and Vehicle Effects Are Biased with Small Sample Sizes
The Law of Large Numbers is the culprit here.
The Law of Large Numbers is what allows casinos to take your money (while saying they have payouts that appear to favor you to some degree). It’s also the Law that says large sample size studies are more likely to be good estimates of the population they are drawn from. Likewise, it’s also the Law that says that small sample sizes are not good at replicating the population.
The Law of Large Numbers says that the mean of a sample will be a good estimate of the population mean as the number of samples approaches infinity. Likewise, we’ll see a median that is closer, and we’ll see a variance that is closer, to the population median and variance, respectively.
But, this also helps to explain why we see a lot of variability in measures, like LD50s, for the same chemical. When you have small sample sizes, and are measuring the same effect across different labs, you will see variance. The labs will not agree on an effect until all of the labs are using a very large number of experimental units (i.e., the thing receiving the treatment; could be a cage, could be an animal, could be cells).
So What’s the Bottom Line?
Yes, you do need to perform a post hoc power analysis. If you don’t, I will do it for you, and you may not like what I have to say.
It’d be better if you performed Type M and Type S analyses. But you may have a hard time calculating what your actual effect size should be. But this is better because the effect size in your study is likely biased, especially if you have a significant p-value.