This code shows how to easily plot a beautiful confidence interval diagram in R.
First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from a survey of how many minutes it takes to drive from home to school at a small college.
sample1 <- c(16, 36, 12,36,9,19,27,54,24,20) sample2 <- c(20,31,30,34,12,7,23,19,25,15)
Next, create a data frame (R’s representation of a table) with three columns representing a summary of the raw data. There are two rows for the two confidence intervals, but you can have as many rows as you need (one or more). The first column in the data frame represents the labels of the two samples. The second is obviously the sample mean, and the third is the standard error.
df <- data.frame( sample.number = as.factor(c("sample1", "sample2")), sample.mean = c(mean(sample1), mean(sample2)), sample.e = c(sd(sample1), sd(sample2)) )
Next, install ggplot2 (if necessary) and load it.
# installation is remembered until R is upgraded install.packages('ggplot2') # require() is required once per session to load the package require(ggplot2)
Sample standard deviation
Finally, invoke ggplot to draw the confidence intervals where the bound is simply the sample standard deviation:
ggplot(df, aes(sample.number, sample.mean, ymin=sample.mean-sample.e, ymax=sample.mean+sample.e)) + geom_crossbar(width = 0.5) + coord_flip() # geom_crossbar() uses cross bars to show the mean and CI points # coord_flip() flips the axes
95% confidence interval
Above you have the confidence interval with the mean plus or minus the standard error, but in some cases you want
Where t is the t critical value based on df = n – 1, s is the sample standard deviation, and n is the size of the sample. This requires just a few more calculations:
# calculate n for convenience n <- length(sample1) # add 95% confidence bound as a new column to existing data frame # for a 95% confidence interval, use 0.975 (not 0.95) # for explanation, see <https://stat.ethz.ch/pipermail/r-help/2008-June/164286.html> df$sample.bound <- c( qt(p=0.975, df=n-1) * sd(sample1) / sqrt(n), qt(p=0.975, df=n-1) * sd(sample2) / sqrt(n)) # plot 95% confidence interval ggplot(df, aes(sample.number, sample.mean, ymin=sample.mean-sample.bound, ymax=sample.mean+sample.bound)) + geom_crossbar(width = 0.5) + coord_flip()
Notice the latter plot has narrower intervals because qt(0.975,df=25-1)/sqrt(n)
is 0.65.
Thank you to Hadley Wickham for getting me started.
What’s wrong with geom_linerange .. ?
Ultimately it’s a matter of preference, but for me, the confidence interval drawn by geom_linerange is so thin it gets lost in the rest of the plot—so much that at a glance the plot looks blank—so I prefer the bigger geom_crossbar. Also, I prefer the line clearly marking the middle point.
I get a different first plot. ymin and ymax look off.
And re: “qt(0.975,df=25-1)/sqrt(n)” isn’t n=10, not 25?