Do Phthalates Really Lead to Early Death?

The available evidence suggests no -- phthalates in consumer products are not going to kill you; they are not leading to early death; the study involved has some significant issues, and most importantly -- correlation (what the study shows) is not causation.

So what’s going on here?

Recently The Hill, CNN, The Guardian, the New York Post, and many other media outlets, have run headlines like, “Synthetic ‘everywhere chemicals’ linked to deaths among older Americans: study” and “Shocking study says chemicals found in shampoo, makeup may kill 100k Americans prematurely each year.”

Each of these stories cite a study by Trasande, Liu and Bao, currently in press at Environmental Pollution (made available in advance on October 12, 2021;

The Toxic Truth: Nope, phthalates won’t kill you; the study was chasing noise and the subjects don’t represent the general US population. Ultimately, a flawed correlation analysis that does not infer causality.

What this all means is that, sure, their analysis says they found a weak effect, but when looked at through a Bayesian lens, it becomes clear that there really is no effect at all. And when I compared the mortality rates in the group that had the lowest phthalate levels to the US population they were very different. That means there may be some sample bias — and that tends to lead to false positive results (especially when chasing noise).

What Are Phthalates?

Before we jump too far into this, let’s talk about phthalates.

Phthalates are chemicals found in plastics that make them more flexible and durable. Sometimes they are called “plasticizers”. They are also used in consumer products for various reasons, sometimes as gelling agents. The thing to know is this: like all chemicals, in small amounts these chemicals are innocuous; however, at higher concentrations they may become toxic. The same is true for water, table salt, sugar, vinegar, you name it.

So What Happened in the Trasande et al Study?

Trasande et al make the assumption that the data they are using are representative of the population, and that we can largely rely upon the means (and variance) from their data to draw conclusions about the population (this is a bit simplified, but it’s accurate).

Here’s the problem with the Trasande et al approach, especially when dealing with human populations: sampling bias. Sampling bias results when we take too few samples from an underlying population, and the characteristics of the sample no longer match the characteristics of the population. This is the reason that drugs need to have what we call “Phase 4” clinical trials or post-market surveillance — the fancy words for “keep studying what effects the drug might have on the actual population so that we can pull the drug off the market if it looks like the population is responding differently from the sample in the earlier clinical trials.”

Trasande et al are only looking at 5,303 people! We have a population of over 300 million, and roughly that amount at the time the study started (it was around 285 million in 2001) and roughly 310 million in 2010 when the people studied gave their last urine samples. Subjects continued to be followed to see who may have died until 2015 (when the US had a population of 320 million). Any way you slice it, that’s less than 0.002% of the US population being studied.

So what Trasande et al are doing is trying to say that people with the highest levels of high molecular weight phthalates (these are specific types of phthalates) are more likely to die than people who have the lowest levels.

What don’t we know? We don’t know if the overall US population has higher, moderate, or lower levels of these phthalates. We also don’t know if the groupings of phthalate levels are the best groupings in terms of biology. For instance, all chemicals will have some threshold concentration below which we will not see toxicity/disease. So one question is which of these levels (the low, moderate, or high) are above or below the threshold? Could it be that one of these levels straddles the threshold? Could it be that all of these people are below the anticipated threshold? There are a lot of questions here that the study simply doesn’t address.

The authors of the study arbitrarily divided people into 3 groups — group one had the lowest levels and is the one they compare the other groups to (called Tertile 1), group two had intermediate amounts (Tertile 2), and group 3 had the highest amounts (Tertile 3). Note that they about evenly divided the people into these three groups. Note: this means that the groupings really, truly are arbitrary.

What I like to do is to look at these groupings as distributions. I don’t have the raw data, but I don’t need it in this case. Instead, I’m going to construct likely distributions, making one key assumption — that the data follow a beta-binomial distribution. This is the distribution we typically use when we have proportion data (x out of y have some characteristic we care about). So in this case, group 1 (the lowest tertile) has 324 deaths out of 15,780 person years in the study (see image below, red histogram). Person years are the number of years a person was in a study before they died or no longer participated. Tertile 2 had 346 deaths out of 15,533 person years (blue histogram, below). Tertile 3 had 344 deaths out of 15,512 person years (yellow histogram, below). The x-axis are probability of death, while the y-axis is the count of each probability of death. If you want the code for this, email me, I’m happy to share.

What one sees immediately is that Tertiles 2 and 3 are nearly identical in terms of the shape of their distributions — their average death rates are nearly identical at 0.0222 (or 2.22%) each. Tertile 1 has a mean death rate near 0.0205 (2.05%). But critically important is the fact that the three distributions overlap — a lot. The easiest way to see how much they overlap is to subtract one distribution from the other. So that’s what I did next:

In the above distribution, we can see that the average difference between Tertile 1 and Tertile 2 is around 0.0017 — that’s less than 1% at 0.17% — not a very big difference.

What about the difference between Tertile 1 and Tertile 3?

The difference here (above) is 0.0017 as well — so less than 1% at 0.17%, again. That’s the same as the difference between Tertile 1 and 2, as expected, and thus isn’t very big.

But notice that 0 is definitely a credible value in both cases — 14% of the values are smaller than 0, which means 86% are larger than 0. Our typical decision criterion is that 0 has to be outside the highest density interval — in this case, the highest density interval ranges from -0.0015 to 0.0049, which includes 0. As 0 is definitely a credible value, then that means that there is a very high likelihood that Tertiles 1, 2, and 3 all come from the same parent population distribution.

So what that means is that any difference Trasande et al says they can detect is likely just noise — it’s a product of the fact that they are not actually looking at the probable population itself, but rather, running their statistical models on the sampled data alone.

Trasande et al’s data support this notion, too. They are performing a lot of comparisons, and their hazard ratios are generally pretty small. The closer the hazard ratios are to 0 the more likely they are to be within the noise zone.

The bigger issue here is that Trasande et al are not performing multiple test corrections. When the hazard ratios are this close to 1.0, it doesn’t matter how small the p-value might be — we are still more likely than not to be looking at false positives. This is especially true when considering how much higher the mortality rate is in these individuals in the study (close to 2%) compared to the nation-wide average at the comparison time-period (0.87%). That means that the samples are not very representative of the nation-wide population.

There is one error that I found in Trasande et al’s paper. The CDC Wonder database shows that the crude mortality rate for 2013-2014 per 100,000 is actually 865.2, not the 965.2 as reported in Trasande et al’s paper (see below):

However, it appears this is merely a typo in the Trasande et al paper, as 865.2/100,000 * (1.48-1.00) = 415/100,000 — which is the incremental mortality rate reported in Table 4. Since this is a typo, it appears to be an honest issue.

Also, as a practical matter, Trasande et al’s results are correlative in nature, which means they cannot demonstrate a causal relationship between the high molecular weight phthalates and mortality in people.

So, the Bottom-Line Is:

  1. Phthalates are not likely to increase mortality in humans.; the study is merely correlative, there is no way to say definitively these people died due to the phthalates, and to say otherwise is misleading.
  2. The result in Trasande et al is likely to be noise.
  3. The samples in NHANES used by Transande et al to represent the lowest exposed group also exhibit a higher mortality rate than the US average, meaning the samples do not represent the overall population.

Like What You Read?

If you liked this article, please subscribe to get updates about new articles, or if this article gets updated.

Also, feel free to share with your friends and family, and help them get a better idea of the Toxic Truth about the chemicals around all of us.

Lyle D. Burgoon, Ph.D.
Lyle D. Burgoon, Ph.D.
Dr. Burgoon is a pharmacologist/toxicologist cross-trained in biostatistics and software engineering. Dr. Burgoon writes on chemical safety, biostatistics, biosecurity, sustainability, and scientific ethics. He is the President and CEO of Raptor Pharm & Tox, Ltd, a consulting firm.

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here