When confidence intervals stop meaning what you think they mean
Every few years, the same statistical advice makes the rounds again: “Don’t judge differences by whether confidence intervals overlap.” That advice is correct. It’s also old. And, at least for normally distributed means, it’s largely settled science.
So if that were the whole story, there wouldn’t be much left to say.
But here’s the part that isn’t settled — and that, in practice, matters far more.
Asymmetric confidence intervals are everywhere
In modern applied research, especially in medicine, epidemiology, and the social sciences, confidence intervals are very often asymmetric. Think of:
Odds ratios
Hazard ratios
Rate ratios
Quantiles
Bootstrap confidence intervals
Robust estimators
These intervals are routinely shown in forest plots, tables, and figures. They are shown correctly, computed correctly, and interpreted correctly with respect to the null value (e.g., HR = 1).
And yet, something subtle happens the moment we start comparing them visually.
Forest plots get one thing exactly right
Let’s be clear about this upfront: forest plots in randomized trials are not the problem.
In a standard forest plot, what you see is a contrast — a difference, a log-ratio, or some other effect estimate — along with its confidence interval. Checking whether that interval includes the null value is a perfectly valid inferential step.
This holds regardless of whether the CI is symmetric or asymmetric.
No issue there.
Where things quietly go off the rails
The problem starts one step later, and it’s a step we take almost automatically.
Consider a subgroup analysis in a randomized trial. You see something like this:
Subgroup A: \(\mathrm{HR} = 0.72\) (95% CI 0.57–0.92)
Subgroup B: \(\mathrm{HR} = 0.92\) (95% CI 0.64–1.31)
\(p_{\text{interaction}} = 0.23\)
Statistically, the message is straightforward: there is no evidence that the treatment effect differs between subgroups.
Visually, however, the message feels different. One interval excludes 1. The other doesn’t. The intervals look quite different in width and position. It is almost irresistible to think: “It works here, but not there.”
At that point, no one is explicitly computing CI overlap. No one is writing down a test statistic. But inference is happening anyway — by eye.
Why asymmetry breaks the intuition
With symmetric confidence intervals for means, the idea of “overlap” at least has a rough geometric interpretation. You can argue about how conservative it is, but you know what you’re looking at.
With asymmetric intervals, that intuition collapses:
The point estimate is no longer centered.
The left and right sides of the interval do not have comparable meaning.
The visual distance between two intervals depends on scale, transformation, and estimator — not just uncertainty.
Two asymmetric confidence intervals can overlap substantially and still correspond to a statistically meaningful difference in effects. They can also look strikingly different while the interaction test clearly says “no evidence of effect modification.”
In that setting, asking whether the intervals “overlap” is not just unreliable — it’s ill-defined.
This is not a niche problem
Importantly, this is not about bad statistics or sloppy analyses.
You can find this exact pattern in well-conducted secondary analyses published in journals like JAMA and JAMA Network Open: asymmetric confidence intervals shown side by side, interaction tests reported (and non-significant), and yet a strong visual pull toward subgroup comparisons.
The analyses are fine. The plots are fine. The trouble lies entirely in how easily the human visual system turns those plots into informal hypothesis tests.
The real lesson for practice
So what should we actually learn from this?
Not that confidence intervals are useless. Not that forest plots are misleading. And not that everyone is doing statistics wrong.
The lesson is simpler and more uncomfortable:
Once confidence intervals are asymmetric, “overlap” is no longer an inferential concept.
At that point, visual comparison answers no well-defined statistical question. If you want to know whether two effects differ, you need a confidence interval for their difference (or ratio), or a formal interaction test.
Anything else is pattern recognition masquerading as inference.
A rule that actually helps
If you want a single practical rule to take away, it’s this:
Confidence intervals can be interpreted against their null value. They cannot, in general, be interpreted against each other — especially when they are asymmetric.
That rule won’t make figures prettier. But it will prevent a surprising number of subtle misinterpretations.
And in applied statistics, that’s usually the best kind of improvement.
What to do instead (in one sentence)
If your real question is “Are these two effects different?”, compute a confidence interval for the contrast of effects (or run an explicit interaction test). Do not try to answer that question by eyeballing whether two asymmetric confidence intervals overlap.