Confidence interval diagram in R

This code shows how to easily plot a beautiful confidence interval diagram in R.

First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from a survey of how many minutes it takes to drive from home to school at a small college.

sample1 <- c(16, 36, 12,36,9,19,27,54,24,20)
sample2 <- c(20,31,30,34,12,7,23,19,25,15)

Next, create a data frame (R’s representation of a table) with three columns representing a summary of the raw data. There are two rows for the two confidence intervals, but you can have as many rows as you need (one or more). The first column in the data frame represents the labels of the two samples. The second is obviously the sample mean, and the third is the standard error.

df <- data.frame(
sample.number = as.factor(c("sample1", "sample2")),
sample.mean = c(mean(sample1), mean(sample2)),
sample.e = c(sd(sample1), sd(sample2))
)

Next, install ggplot2 (if necessary) and load it.

# installation is remembered until R is upgraded
install.packages('ggplot2') 

# require() is required once per session to load the package
require(ggplot2) 

Sample standard deviation

Finally, invoke ggplot to draw the confidence intervals where the bound is simply the sample standard deviation:

ggplot(df, aes(sample.number, sample.mean, ymin=sample.mean-sample.e, ymax=sample.mean+sample.e))  + geom_crossbar(width = 0.5)  + coord_flip()
# geom_crossbar() uses cross bars to show the mean and CI points
# coord_flip() flips the axes

95% confidence interval

Above you have the confidence interval with the mean plus or minus the standard error, but in some cases you want

interval = \bar{x} \pm (t)\left ({s \over \sqrt(n)} \right )

Where t is the t critical value based on df = n – 1, s is the sample standard deviation, and n is the size of the sample. This requires just a few more calculations:

# calculate n for convenience
n <- length(sample1)

# add 95% confidence bound as a new column to existing data frame
# for a 95% confidence interval, use 0.975 (not 0.95)
# for explanation, see <https://stat.ethz.ch/pipermail/r-help/2008-June/164286.html>
df$sample.bound <- c(
        qt(p=0.975, df=n-1) * sd(sample1) / sqrt(n),
        qt(p=0.975, df=n-1) * sd(sample2) / sqrt(n))

# plot 95% confidence interval
ggplot(df, aes(sample.number, sample.mean, ymin=sample.mean-sample.bound, ymax=sample.mean+sample.bound)) + geom_crossbar(width = 0.5)  + coord_flip()

Notice the latter plot has narrower intervals because qt(0.975,df=25-1)/sqrt(n) is 0.65.

Thank you to Hadley Wickham for getting me started.

About these ads

4 thoughts on “Confidence interval diagram in R

    • Ultimately it’s a matter of preference, but for me, the confidence interval drawn by geom_linerange is so thin it gets lost in the rest of the plot—so much that at a glance the plot looks blank—so I prefer the bigger geom_crossbar. Also, I prefer the line clearly marking the middle point.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s