I’ve read a lot of micronucleus studies lately. And I’m not here to call anyone out. Honestly, I’m here to help. But the first step in an intervention is to help you realize there’s a problem. And when it comes to statistics, I’m afraid most of what you may have been told over the years is simply wrong. But don’t despair, I’m here to help.
I’m going to discuss challenges with nesting, the fact that your data probably isn’t actually normally distributed, and why chi-squared and Fisher’s exact test simply don’t work for this data. And at the end I’m going to walk you through how to perform a treatment vs control analysis a better way — using Bayesian statistics.
Nesting – The Most Common Challenge
One thing I absolutely love about most micronucleus assay protocols is that samples are typically duplicated. What I mean is that it’s common to take a biological sample and plate it on two slides for duplicate cell counts. That is great!
The issue is that the two slides aren’t independent biological replicates or experimental units. They are technical replicates. Technical replicates are great because they help us understand what the variance is associated with an operation — in this case, cell counting. That’s a great QC practice.
But there are two problems I commonly see:
- Investigators will treat the duplicate slides as independent samples (thus inflating their sample size by 2x)
- Investigators don’t used a nested analysis approach
Can you take the mean of the duplicate slides and use that in your analysis? Yeah, you can. It’s not awful. But I’d prefer you use a nested analysis (that way you can still use that variance to better understand your data).
But whatever you do, do NOT treat duplicate slides as independent samples. I see that way too often in the published literature. That’s a sure-fire way to inflate your false-positive rate and get better p-values.
What’s This Nested Analysis You Keep Talking About?
In Bayesian analysis we would perform a hierarchical Bayesian analysis — that hierarchical nature handles the nesting in a very intuitive way (we model the nesting directly). If you are committed to a Frequentist approach, that’s cool, you’d want to use a General Linear Mixed Model with the slide effect as a random effect.
Not sure how to do that? I’ve got a course for you to try, or contact me for coaching.
No, Your Data Aren’t Normally Distributed
None of the micronucleus data I’ve seen lately are normally distributed. Here’s a quick test to tell if your data are normally distributed or not that you can do in Excel:
I’m using the norm.dist() function in Excel to calculate the amount of the distribution that is less than the number in Column A, using the mean in cell F1, and the standard error in F2 or its corresponding standard deviation in F3 — note that I am using the sqrt of 5 because I have 5 samples in this example.
Cell B2 has a value of 0.03 which means 3% of the distribution is less than 0 in this case — in other words, 3% of the distribution is negative. We can’t have negative cell counts. Therefore, a normal distribution is not a good choice for this data.
And yes these are real data — I took them from this paper.
But in general, cell counts are not normally distributed, even if negative values aren’t coming up using the approach in Excel I showed you. Instead, we typically model counts of things as Poisson, gamma, negative-binomial, binomial, etc.
You might be asking why binomial? And the reason is because binomial is what we use when we have data that are either/or. So you either have a micronucleus or you don’t. That’s either/or, so that’s actually binomial.
Now someone might say, “I was told when my sample sizes get large enough, even a binomial distribution becomes normal because of Central Limit Theorem and the Law of Large Numbers…” Yeah, sure, that’s true, as you approach infinity. That’s the same argument casinos use for the odds of you winning at video poker, slots, roulette, etc. It’s possible…sure, but there are a lot of conditions you have to rely upon to get there.
And Because Your Data Is Binomial…
That means we normally view your data as some type of contingency table. Like a 2 x 2 table:
Analyzing this type of data by an ANOVA or t-test or something similar is simply inappropriate. The proportions matter — that’s what you’re actually interested in. And proportions simply do not meet the assumptions required of an ANOVA or t-test in most circumstances.
Because it’s a contingency table, most people will naturally gravitate towards analyzing the data using either Pearson’s chi-squared goodness of fit test. Some people call it a pairwise chi-squared test — I’m not sure where that language comes from, but it seems a bit inaccurate to me, because paired tests usually connote tests where samples are matched/paired. The other option is Fisher’s Exact Test. Let’s talk about why these aren’t good options.
Fisher’s Exact Test
Fisher’s Exact Test has a really tough requirement a lot of non-statisticians don’t know about. I’ll let ya in on the secret — *jargon alert* it only works where the margins are predefined.
So what does that actually mean?
Look at that 2×2 table. See the numbers 8 and 5,992 at the bottom, and 3,000 and 3,000 on the right? Those totals are the margins. The 3,000 and 3,000 are fixed — that’s the number of cells analyzed. But the 8 and 5,992 are not fixed — those are random. They will change based on what you see — they’re the data/results.
Because those values aren’t fixed, but are random, you can’t use the Fisher’s Exact Test. It’s that simple.
But What About Chi-Squared?
Bad news, Bear: chi-squared doesn’t work well when you have small numbers in the cells. Typically, the micronucleus assay results in small numbers in those cells. What that means is that the p-value from the chi-squared analysis is going to be highly inaccurate. That 3 and 5 — those are non-starters.
There are ways of getting around that, but those are a bit too deep and technical for us here. Chi-square also doesn’t calculate an exact p-value, and there can be issues when your sample sizes get too large. Chi-squared is kinda like Goldilocks — it requires everything to be just right to be used properly.
But don’t worry there’s an easy way out.
This is How You Analyze It
We model the data that you have using a beta-binomial distribution. What you actually care about is whether or not the proportion of cells with micronuclei is greater in a treatment compared to a vehicle control. That’s what the beta-binomial distribution handles for us.
I admit, you’re going to have to run R. I’m going to walk through this procedure here, but if you take my Bayesian analysis course, you’ll learn this technique, plus many others, and how to use R.
In this case, I’m just going to look at a comparison of 2 groups — vehicle and a treated. If you have a case where you have multiple doses and want to make a comparison to a single vehicle, then I’d recommend that you run a hierarchical model in R using Stan.
This simple case of just a vehicle and a single treated group can be performed algebraically in R. The beta-binomial distribution will calculate the probability, p, with uncertainty, that a cell will be micronucleus positive, and 1-p that it is micronucleus negative. In this example we are assuming we have no prior knowledge of what the vehicle probability should be (this probably isn’t so, as your vehicle has probably been studied by your lab or others extensively already, but for simplicity we’ll assume there’s no prior information).
We’re going to assume that you have 15 samples where you have measured micronucleus counts per 1,000 cells in the vehicle. So, for the vehicle we would simulate the posterior distribution as:
veh <- c(2.5, 2.0, 1.5, 2.0, 2.5, 1.5, 1.5, 1.0, 1.5, 1.0, 2.5, 2.0, 1.5, 1.0, 1.5 ) veh_distro <- rbeta(10000, sum(veh), 15000-sum(veh))
What this says is that we’re creating a new object to hold our vehicle count data, calling it veh. We’re then going to build the beta-binomial distribution using the beta distribution function in R, specifically rbeta which generates random samples from the beta distribution.
The 10000 means we are randomly sampling 10,000 times from the beta distribution. The beta distribution takes two additional parameters. The first one, sum(veh), is the sum of count frequency of micronucleus found per 1000 cells. The second parameter, 15000-sum(veh), is the number of cells that do not have a micronucleus.
Let’s say our treatment group was 250mg/kg of ethymethylbadstuff. And let’s say that it’s counts in samples from 5 animals were: 2.0, 4.5, 3.5, 2.0, and 3.0 per 1,000 cells. Note that these are taken from duplicate slides. This would be encoded as such:
d_250 <- c(2.0, 4.5, 3.5, 2.0, 3.0) d_250_distro <- rbeta(10000,sum(d_250), 5000-sum(d_250))
So now we have 10,000 random draws from the vehicle and treated beta distributions. Next, we’ll calculate the difference between them, and visualize it:
hist(d_250_distro - veh_distro)
And this yields the following histogram:
Now, before we started down this road, what we should have done is to define the Region of Practical Equivalence (ROPE). The ROPE signifies the region of differences that are biologically the same as no difference. In other words, differences outside the ROPE are biologically meaningful. For instance, I think we would all agree that a difference of 0.001, or 1 per 1,000 is probably not biologically meaningful. What we are looking for is the number of micronuclei per 1,000 cells that is demonstrably adverse. To be clear — no matter how you analyze the data, this is always a statistical requirement. The reason why is because as the number of samples increases, the likelihood of getting a p < 0.05 also increases. Given enough samples, you’ll have a p < 0.05 when the effect size (e.g., difference in group means) is extremely small and not biologically meaningful.
It still appears that the genotox literature hasn’t yet set a biologically meaningful threshold for increases in micronuclei rate. Thus, probably the best threshold at this time is that the treated group must be differentiable from the vehicle. In other words, the ROPE would be the 95% highest density interval (HDI) of the vehicle (the region of the vehicle population that contains 95% of the density).
To find the 95% HDI for the vehicle we will use the HDInterval package in R. If you don’t have it, you’d run this only once to install it:
Once installed, you just run the following command to load it so you can use it:
So, now we’ll load HDInterval and calculate the 95% HDI for vehicle:
library(HDInterval) veh_hdi <- hdi(veh_distro)
If you want to see what the 95% HDI looks like for vehicle, do this:
Which will return this:
lower upper 1.029945e-05 4.192489e-03 attr(,"credMass")  0.95
So the ROPE for vehicle spans from 1.03 x 10^-5 up to 4.19 x 10^-3 — that’s quite a span, and it’s all really tiny.
You can visualize the ROPE on that histogram for the difference between treated and vehicle this way:
hist(d_250_distro - veh_distro) veh_hdi <- hdi(veh_distro) abline(v = veh_hdi - mean(veh_distro)) abline(v = veh_hdi + mean(veh_distro)) diff_hdi <- hdi(d_250_distro - veh_distro) segments(diff_hdi, y0 = 3000, diff_hdi, y1 = 3000)
The first line draws the histogram for the difference distribution — that is, the difference between the treated and vehicle distributions. The next line calculates the vehicle highest density interval. The next line draws a vertical line at the lower end of the vehicle HDI relative to zero on the difference distribution. The next line draws a vertical line at the upper end of the vehicle HDI, again, relative to zero on the difference distribution. The next line calculates the 95% HDI for the difference distribution. The final line draws the 95% HDI for the difference distribution onto the histogram.
Decision Rule: If the 95% HDI of the difference distribution is completely outside the ROPE, then the treatment is considered to be from a different distribution compared to the vehicle. If the 95% HDI of the difference distribution is inside any part of the ROPE, then the treatment is considered to not be different from the vehicle.
We can see that the 95% HDI for the difference distribution is drawn (straight horizontal line) at the frequency = 1,500. We can also see that the 95% HDI is contained entirely within the ROPE. What this means is that 0 difference is a credible value, and therefore, the 95% HDI is contained entirely within the ROPE, and therefore treatment is not different from the vehicle, given our uncertainty.
How Could We Improve This Analysis?
For starters, we could have used prior information about the vehicles. There are many times that tests come back statistically, with large effect sizes, and it’s due simply to the fact that the vehicles were in the lower range of the population distribution. This still qualifies as a false positive. And this is more likely to happen in the scientific literature so long as the file drawer effect is occurring — that is, so long as negative results are less likely to be published.
Prior information for the vehicles is just added to the beta distribution. It’s just like we did earlier when building the two distributions. We are summing up for each sample. If we believe the samples for the prior are exchangeable (which means that they are similar enough to be combined), we can just arithmetically add them in. It’s pretty straightforward and easy.
So that’s it. If you’re using ANOVA, t-tests, Fisher’s Exact Test, or chi-squared to analyze your micronucleus data you really should stop.
A far better alternative is to use the beta-binomial analysis described here. If you want to learn more about Bayesian analysis, and how to perform this type of analysis when you have multiple treatment groups, reach out for some coaching or take my class.