Let’s walk through using R and Student’s t-test to compare paired sample data. The book Statistics: The Exploration & Analysis of Data (6th edition, p505) presents the longitudinal study “Bone mass is recovered from lactation to postweaning in adolescent mothers with low calcium intakes”. The total-body bone mineral content (TBBMC) of young mothers was measured during breast feeding and then in the postweaning period. The book asks, despite the calcium drain of breastfeeding and low calcium intake, did the mothers gain at least 25 grams of bone mineral content? The average mother’s age is 16 years old, so the mothers’ bodies are growing too. Use a significance level of 0.05.

First, state null hypothesis and the alternative hypothesis:

H_{0}: μ_{d} <= 25 grams

H_{a}: μ_{d} > 25 grams

Second, enter the given data into R. The variable *x* is a data frame for mothers 1 through 10. The column *b* stands for the first measurement (during breastfeeding). The column *p* stands for the second measurement (during postweaning period). All values are in grams of bone density.

x <- data.frame(mother=1:10, b=c(1928, 2549, 2825, 1924, 1628, 2175, 2114, 2621, 1843, 2541), p=c(2126, 2885, 2895, 1942, 1750, 2184, 2164, 2626, 2006, 2627)) # derive the difference x$diff <- x$p-x$b # inspect the data frame x

Before any statistical testing, check the means and difference of means. The postweaning average is higher which is a good sign that we may be able to reject H_{0}.

> mean(x$b) [1] 2214.8 > mean(x$p) [1] 2320.5 > mean(x$p)-mean(x$b) [1] 105.7

Because n=10 is small, the distribution of the differences should be approximately normal. Check using a boxplot and QQ plot. There is some skew (one mother gained 336 grams), but mostly the plot show normality.

boxplot(x$diff) qqnorm(x$diff) qqline(x$diff)

Next check the Shapiro-Wilk test of normality. The normality test gives p=0.13, which is large, so we do not reject the null hypothesis that the values are distributed normally. Remember not to confuse the normality test with the t-test, and in the normality test, large values support the distribution is normal. In other words, we want a large p-value for the normality test.

> shapiro.test(x$diff) Shapiro-Wilk normality test data: x$diff W = 0.8824, p-value = 0.1389

Finally we invoke Student’s t-test. But first to illustrate the same point Peck and Devore make in the book, let’s see that “disregarding the paired nature of the samples results in a loss of information” (p497). Instead of using the paired t-test, use the two sample test. Because R and the test have no concept of the context, they happily produce p=0.33, and because *p* is large, we would fail to reject H_{0}.

> # Remember, the two-sample test is inappropriate. > t.test(x$p, x$b, mu=25, paired=F, alternative="greater") Welch Two Sample t-test data: x$p and x$b t = 0.4495, df = 17.99, p-value = 0.3292 alternative hypothesis: true difference in means is greater than 25 95 percent confidence interval: -205.645 Inf sample estimates: mean of x mean of y 2320.5 2214.8

Next let’s do it the right way with a paired t-test, which gives p=0.02. Because *p* is smaller than alpha, we reject H_{0}. In other words, it is unlikely the observed difference happened by chance, so the mothers probably gained more than 25 grams of bone mineral content.

> t.test(x$p, x$b, mu=25, paired=T, alternative="greater") Paired t-test data: x$p and x$b t = 2.4575, df = 9, p-value = 0.01815 alternative hypothesis: true difference in means is greater than 25 95 percent confidence interval: 45.50299 Inf sample estimates: mean of the differences 105.7

Now if don’t want R to do all the test calculations for you, you can use R more like a dumb spreadsheet. The test statistic is the deviation (mean of differences minus hypothesized value) divided by the standard error of the mean:

Calculate the test statistic in R:

> x_bar_sub_d <- sum(x$p-x$b)/length(x$p) > s_sub_d <- sd(x$p-x$b) > t <- (x_bar_sub_d - 25)/(s_sub_d / sqrt(length(x$p))) > t [1] 2.457468

Good: this gives t=2.457468 which is the same as the output of *t.test*. Now rather than checking a table of t critical values, use R to calculate the p-value. Remember this is a directional test, and we want just one tail.

> 1-pt(q=2.45768, df=10-1) [1] 0.01814827

The answer in the back of the book gives t=2.46 and P is approximately 0.017, so we are done.

The exercise for the reader is to explain why the paired samples gives a smaller p-value than the two sample test. Can you draw a graphic?

Paired sample tests are one thing; paired sample analyses, done numerically and graphically, is quite another. See http://www.amstat.org/publications/jse/v17n1/helmreich.pdf for examples that include comprehensive graphics of different forms of two dependent sample analyses.

Install the package granova, or its newest incantation, granovaGG (based on Hadley Wickham’s ggplot2 (re: grammar of graphics) to create graphics. (I did this for your particular example, and can say that the results are quite simple to explain as there are

few complications in the data (but complications are more common than not in my broad experience).) To be more specific, I used your x <- data.frame(mother=1:10,

b=c(1928, 2549, 2825, 1924, 1628, 2175, 2114, 2621, 1843, 2541),

p=c(2126, 2885, 2895, 1942, 1750, 2184, 2164, 2626, 2006, 2627))

specification (which gives a 3 column data frame called x); then I ran granovagg.ds(x[,-1], revc=T) to get a nice figure, including 95% CI. Bob Pruzek

Thank you, I’m interested in reading that later and learning how to interpret the image. Meanwhile, here is the image