Binomial confidence intervals: exact vs. approximate

This graph and R code compares the exact vs. normal approximations for 95% binomial confidence intervals for n trials with either one success or 50% success using the R functions binom.test() and prop.test().

# Return a data frame of the 95% confidence intervals using
# the exact binomial, an approximation, and the difference.
# n.min = minimum number of trials
# n.max = maximum number of trials
# x = number of successes (do not use with p)
# p = proportion of successes (do not use with x) <- function(n.min=10, n.max=100, x=NA, p=NA)
	if(1 != sum(, stop('specify one of x or p')
	binom <- data.frame()
	for (n in c(n.min:n.max))
			x <- floor(n * p)

		exact.lower <- binom.test(x, n)$[1]
		exact.upper <- binom.test(x, n)$[2]
		approx.lower <- prop.test(x, n)$[1]
		approx.upper <- prop.test(x, n)$[2]
		brow <- data.frame(n, x, exact.lower, exact.upper, approx.lower, approx.upper)	
		binom <- rbind(brow, binom)
	binom$error.lower <- binom$exact.lower - binom$approx.lower
	binom$error.upper <- binom$exact.upper - binom$approx.upper

# One success out of n trials
x1 <- = 10, n.max=100, x=1)
# 50% success out of n trials
p50 <- = 10, n.max=100, p=0.50)

# Combine and reshape data.frames to a long format
binom.long <- rbind(
	data.frame(line='Lower limit, one success', x=x1$n, y=x1$error.lower),
	data.frame(line='Upper limit, one success', x=x1$n, y=x1$error.upper),
	data.frame(line='Lower limit, 50% success', x=x1$n, y=p50$error.lower),
	data.frame(line='Upper limit, 50% success', x=x1$n, y=p50$error.upper)

# Plot all four lines together in a single graph
ggplot(binom.long, aes(x=x, y=y, color=line)) + 
	geom_line() + 
	theme_bw() +
		panel.grid = element_blank(),
		panel.border = element_blank(),
		legend.key = element_blank(), # border around legend keys,
		legend.position=c(.8,.8), # legend in upper right 
		legend.title=element_blank() # remove legend title
		) +
		y='Difference between exact and approximation', 
		x='Number of trials', 
		title='Binomial confidence intervals: exact vs. approximate'

Binomal confidence intervals: exact method vs. normal approximation

The error is largest when n is small and for p=0.50%, and for the one success confidence interval, the difference in the upper limit is much larger than the lower limit. The CI differences for p=50% are symmetric, while the normal approximation for x=1 is consistently too high for both the lower and upper confidence limits.

So why use the normal approximation? First, when calculating by hand, the normal approximation is easier to calculate because of the simpler formula. Second, you may need to match a calculation in a class or in a book. Third, it’s even faster for R to calculate the normal approximation as shown here with 100,000 trials: the normal approximation is 62x faster than the binomial, though the difference is only a second.

> system.time(
+ for (x in 1:100) binom.test(1,100000)
+ )
   user  system elapsed 
   1.24    0.00    1.24 
> system.time(
+ for (x in 1:100) prop.test(1,100000)
+ )
   user  system elapsed 
   0.01    0.00    0.02 
> 1.24/0.02
[1] 62

Notes: prop.test() by default, as used here, uses Yates’ continuity correction. Also, see the binom.confint() function in the binom package for eight ways to compute the confidence intervals. This code was tested with R 3.0.2 on Windows 7 with ggplot2


One thought on “Binomial confidence intervals: exact vs. approximate

  1. Hi, nice post, code, and graph (found you from r-bloggers). I think you presume too much that the exact CI is best. See for example A. Agresti and Y. Min Biometrics 2001 57(3):963-971.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s