Binomial confidence intervals: exact vs. approximate

This graph and R code compares the exact vs. normal approximations for 95% binomial confidence intervals for n trials with either one success or 50% success using the R functions binom.test() and prop.test().

```# Return a data frame of the 95% confidence intervals using
# the exact binomial, an approximation, and the difference.
# n.min = minimum number of trials
# n.max = maximum number of trials
# x = number of successes (do not use with p)
# p = proportion of successes (do not use with x)
binomial.ci <- function(n.min=10, n.max=100, x=NA, p=NA)
{
if(1 != sum(is.na(x), is.na(p))) stop('specify one of x or p')
stopifnot(n.max>n.min)
binom <- data.frame()
for (n in c(n.min:n.max))
{
if(!is.na(p))
x <- floor(n * p)

exact.lower <- binom.test(x, n)\$conf.int[1]
exact.upper <- binom.test(x, n)\$conf.int[2]
approx.lower <- prop.test(x, n)\$conf.int[1]
approx.upper <- prop.test(x, n)\$conf.int[2]
brow <- data.frame(n, x, exact.lower, exact.upper, approx.lower, approx.upper)
binom <- rbind(brow, binom)
}
binom\$error.lower <- binom\$exact.lower - binom\$approx.lower
binom\$error.upper <- binom\$exact.upper - binom\$approx.upper
return(binom)
}

# One success out of n trials
x1 <- binomial.ci(n.min = 10, n.max=100, x=1)
# 50% success out of n trials
p50 <- binomial.ci(n.min = 10, n.max=100, p=0.50)

# Combine and reshape data.frames to a long format
binom.long <- rbind(
data.frame(line='Lower limit, one success', x=x1\$n, y=x1\$error.lower),
data.frame(line='Upper limit, one success', x=x1\$n, y=x1\$error.upper),
data.frame(line='Lower limit, 50% success', x=x1\$n, y=p50\$error.lower),
data.frame(line='Upper limit, 50% success', x=x1\$n, y=p50\$error.upper)
)

# Plot all four lines together in a single graph
require(ggplot2)
ggplot(binom.long, aes(x=x, y=y, color=line)) +
geom_line() +
theme_bw() +
theme(
panel.grid = element_blank(),
panel.border = element_blank(),
legend.key = element_blank(), # border around legend keys,
legend.position=c(.8,.8), # legend in upper right
legend.title=element_blank() # remove legend title
) +
labs(
y='Difference between exact and approximation',
x='Number of trials',
title='Binomial confidence intervals: exact vs. approximate'
)
```

The error is largest when n is small and for p=0.50%, and for the one success confidence interval, the difference in the upper limit is much larger than the lower limit. The CI differences for p=50% are symmetric, while the normal approximation for x=1 is consistently too high for both the lower and upper confidence limits.

So why use the normal approximation? First, when calculating by hand, the normal approximation is easier to calculate because of the simpler formula. Second, you may need to match a calculation in a class or in a book. Third, it’s even faster for R to calculate the normal approximation as shown here with 100,000 trials: the normal approximation is 62x faster than the binomial, though the difference is only a second.

```> system.time(
+ for (x in 1:100) binom.test(1,100000)
+ )
user  system elapsed
1.24    0.00    1.24
>
> system.time(
+ for (x in 1:100) prop.test(1,100000)
+ )
user  system elapsed
0.01    0.00    0.02
> 1.24/0.02
[1] 62
```

Notes: prop.test() by default, as used here, uses Yates’ continuity correction. Also, see the binom.confint() function in the binom package for eight ways to compute the confidence intervals. This code was tested with R 3.0.2 on Windows 7 with ggplot2 0.9.3.1.