When confidence intervals stop meaning what you think they mean

confidence intervals

inference

visualization

On asymmetric confidence intervals and the limits of inference by eye

Author

Daniel Koska

Published

January 3, 2026

Every few years, the same statistical advice makes the rounds again: “Don’t judge differences by whether confidence intervals overlap.” That advice is correct. It’s also old. And, at least for normally distributed means, it’s largely settled science.

So if that were the whole story, there wouldn’t be much left to say.

But here’s the part that isn’t settled — and that, in practice, matters far more.

Asymmetric confidence intervals are everywhere

In modern applied research, especially in medicine, epidemiology, and the social sciences, confidence intervals are very often asymmetric. Think of:

Odds ratios
Hazard ratios
Rate ratios
Quantiles
Bootstrap confidence intervals
Robust estimators

These intervals are routinely shown in forest plots, tables, and figures. They are shown correctly, computed correctly, and interpreted correctly with respect to the null value (e.g., HR = 1).

And yet, something subtle happens the moment we start comparing them visually.

Forest plots get one thing exactly right

Let’s be clear about this upfront: forest plots in randomized trials are not the problem.

In a standard forest plot, what you see is a contrast — a difference, a log-ratio, or some other effect estimate — along with its confidence interval. Checking whether that interval includes the null value is a perfectly valid inferential step.

This holds regardless of whether the CI is symmetric or asymmetric.

No issue there.

Where things quietly go off the rails

The problem starts one step later, and it’s a step we take almost automatically.

Consider a subgroup analysis in a randomized trial. You see something like this:

Subgroup A: $\mathrm{HR} = 0.72$ (95% CI 0.57–0.92)

Subgroup B: $\mathrm{HR} = 0.92$ (95% CI 0.64–1.31)

$p_{\text{interaction}} = 0.23$

Statistically, the message is straightforward: there is no evidence that the treatment effect differs between subgroups.

Visually, however, the message feels different. One interval excludes 1. The other doesn’t. The intervals look quite different in width and position. It is almost irresistible to think: “It works here, but not there.”

At that point, no one is explicitly computing CI overlap. No one is writing down a test statistic. But inference is happening anyway — by eye.

Why asymmetry breaks the intuition

With symmetric confidence intervals for means, the idea of “overlap” at least has a rough geometric interpretation. You can argue about how conservative it is, but you know what you’re looking at.

With asymmetric intervals, that intuition collapses:

The point estimate is no longer centered.

The left and right sides of the interval do not have comparable meaning.

The visual distance between two intervals depends on scale, transformation, and estimator — not just uncertainty.

Two asymmetric confidence intervals can overlap substantially and still correspond to a statistically meaningful difference in effects. They can also look strikingly different while the interaction test clearly says “no evidence of effect modification.”

In that setting, asking whether the intervals “overlap” is not just unreliable — it’s ill-defined.

This is not a niche problem

Importantly, this is not about bad statistics or sloppy analyses.

You can find this exact pattern in well-conducted secondary analyses published in journals like JAMA and JAMA Network Open: asymmetric confidence intervals shown side by side, interaction tests reported (and non-significant), and yet a strong visual pull toward subgroup comparisons.

The analyses are fine. The plots are fine. The trouble lies entirely in how easily the human visual system turns those plots into informal hypothesis tests.

The real lesson for practice

So what should we actually learn from this?

Not that confidence intervals are useless. Not that forest plots are misleading. And not that everyone is doing statistics wrong.

The lesson is simpler and more uncomfortable:

Once confidence intervals are asymmetric, “overlap” is no longer an inferential concept.

At that point, visual comparison answers no well-defined statistical question. If you want to know whether two effects differ, you need a confidence interval for their difference (or ratio), or a formal interaction test.

Anything else is pattern recognition masquerading as inference.

A rule that actually helps

If you want a single practical rule to take away, it’s this:

Confidence intervals can be interpreted against their null value. They cannot, in general, be interpreted against each other — especially when they are asymmetric.

That rule won’t make figures prettier. But it will prevent a surprising number of subtle misinterpretations.

And in applied statistics, that’s usually the best kind of improvement.

What to do instead (in one sentence)

If your real question is “Are these two effects different?”, compute a confidence interval for the contrast of effects (or run an explicit interaction test). Do not try to answer that question by eyeballing whether two asymmetric confidence intervals overlap.

--- title: "When confidence intervals stop meaning what you think they mean" author: "Daniel Koska" date: 2026-01-03 categories: [confidence intervals, inference, visualization] description: "On asymmetric confidence intervals and the limits of inference by eye" format: html: toc: true toc-depth: 2 page-layout: article --- Every few years, the same statistical advice makes the rounds again: “Don’t judge differences by whether confidence intervals overlap.” That advice is correct. It’s also old. And, at least for normally distributed means, it’s largely settled science. So if that were the whole story, there wouldn’t be much left to say. But here’s the part that isn’t settled — and that, in practice, matters far more. ### Asymmetric confidence intervals are everywhere In modern applied research, especially in medicine, epidemiology, and the social sciences, confidence intervals are very often asymmetric. Think of: - Odds ratios - Hazard ratios - Rate ratios - Quantiles - Bootstrap confidence intervals - Robust estimators These intervals are routinely shown in forest plots, tables, and figures. They are shown correctly, computed correctly, and interpreted correctly with respect to the null value (e.g., HR = 1). And yet, something subtle happens the moment we start comparing them visually. ### Forest plots get one thing exactly right Let’s be clear about this upfront: forest plots in randomized trials are not the problem. In a standard forest plot, what you see is a contrast — a difference, a log-ratio, or some other effect estimate — along with its confidence interval. Checking whether that interval includes the null value is a perfectly valid inferential step. This holds regardless of whether the CI is symmetric or asymmetric. No issue there. ### Where things quietly go off the rails The problem starts one step later, and it’s a step we take almost automatically. Consider a subgroup analysis in a randomized trial. You see something like this: Subgroup A: $\mathrm{HR} = 0.72$ (95% CI 0.57–0.92) Subgroup B: $\mathrm{HR} = 0.92$ (95% CI 0.64–1.31) $p_{\text{interaction}} = 0.23$ Statistically, the message is straightforward: there is no evidence that the treatment effect differs between subgroups. Visually, however, the message feels different. One interval excludes 1. The other doesn’t. The intervals look quite different in width and position. It is almost irresistible to think: “It works here, but not there.” At that point, no one is explicitly computing CI overlap. No one is writing down a test statistic. But inference is happening anyway — by eye. ### Why asymmetry breaks the intuition With symmetric confidence intervals for means, the idea of “overlap” at least has a rough geometric interpretation. You can argue about how conservative it is, but you know what you’re looking at. With asymmetric intervals, that intuition collapses: The point estimate is no longer centered. The left and right sides of the interval do not have comparable meaning. The visual distance between two intervals depends on scale, transformation, and estimator — not just uncertainty. Two asymmetric confidence intervals can overlap substantially and still correspond to a statistically meaningful difference in effects. They can also look strikingly different while the interaction test clearly says “no evidence of effect modification.” In that setting, asking whether the intervals “overlap” is not just unreliable — it’s ill-defined. ### This is not a niche problem Importantly, this is not about bad statistics or sloppy analyses. You can find this exact pattern in well-conducted secondary analyses published in journals like JAMA and JAMA Network Open: asymmetric confidence intervals shown side by side, interaction tests reported (and non-significant), and yet a strong visual pull toward subgroup comparisons. The analyses are fine. The plots are fine. The trouble lies entirely in how easily the human visual system turns those plots into informal hypothesis tests. ### The real lesson for practice So what should we actually learn from this? Not that confidence intervals are useless. Not that forest plots are misleading. And not that everyone is doing statistics wrong. The lesson is simpler and more uncomfortable: Once confidence intervals are asymmetric, “overlap” is no longer an inferential concept. At that point, visual comparison answers no well-defined statistical question. If you want to know whether two effects differ, you need a confidence interval for their difference (or ratio), or a formal interaction test. Anything else is pattern recognition masquerading as inference. ### A rule that actually helps If you want a single practical rule to take away, it’s this: Confidence intervals can be interpreted against their null value. They cannot, in general, be interpreted against each other — especially when they are asymmetric. That rule won’t make figures prettier. But it will prevent a surprising number of subtle misinterpretations. And in applied statistics, that’s usually the best kind of improvement. ### What to do instead (in one sentence) If your real question is “Are these two effects different?”, compute a confidence interval for the contrast of effects (or run an explicit interaction test). Do not try to answer that question by eyeballing whether two asymmetric confidence intervals overlap.