This code shows how to easily plot a beautiful confidence interval diagram in R.
First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from a survey of how many minutes it takes to drive from home to school at a small college.
sample1 <- c(16, 36, 12,36,9,19,27,54,24,20) sample2 <- c(20,31,30,34,12,7,23,19,25,15)
Next, create a data frame (R's representation of a table) with three columns representing a summary of the raw data. There are two rows for the two confidence intervals, but you can have as many rows as you need (one or more). The first column in the data frame represents the labels of the two samples. The second is obviously the sample mean, and the third is the standard error.
df <- data.frame( sample.number = as.factor(c("sample1", "sample2")), sample.mean = c(mean(sample1), mean(sample2)), sample.e = c(sd(sample1), sd(sample2)) )
Next, install ggplot2 (if necessary) and load it.
# installation is remembered until R is upgraded install.packages('ggplot2') # require() is required once per session to load the package require(ggplot2)
Sample standard deviation
Finally, invoke ggplot to draw the confidence intervals where the bound is simply the sample standard deviation:
ggplot(df, aes(sample.number, sample.mean, ymin=sample.mean-sample.e, ymax=sample.mean+sample.e)) + geom_crossbar(width = 0.5) + coord_flip() # geom_crossbar() uses cross bars to show the mean and CI points # coord_flip() flips the axes
95% confidence interval
Above you have the confidence interval with the mean plus or minus the standard error, but in some cases you want
Where t is the t critical value based on df = n - 1, s is the sample standard deviation, and n is the size of the sample. This requires just a few more calculations:
# calculate n for convenience n <- length(sample1) # add 95% confidence bound as a new column to existing data frame # for a 95% confidence interval, use 0.975 (not 0.95) # for explanation, see <https://stat.ethz.ch/pipermail/r-help/2008-June/164286.html> df$sample.bound <- c( qt(p=0.975, df=n-1) * sd(sample1) / sqrt(n), qt(p=0.975, df=n-1) * sd(sample2) / sqrt(n)) # plot 95% confidence interval ggplot(df, aes(sample.number, sample.mean, ymin=sample.mean-sample.bound, ymax=sample.mean+sample.bound)) + geom_crossbar(width = 0.5) + coord_flip()
Notice the latter plot has narrower intervals because
qt(0.975,df=25-1)/sqrt(n) is 0.65.
Thank you to Hadley Wickham for getting me started.