Plotting individual growth charts

This R code draws individual growth plots as shown in “Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence” by Judith D. Singer and John B. Willett, an excellent book on multilevel modeling and survival analysis.

This code recreates figure 2.5 on page 32 with the caption, “OLS summaries of how individuals change over time. Fitted OLS trajectories superimposed on emperical growth plots for participants in the tolerance study.” The main difference is, by default, ggplot2 draws a 95% confidence region for the regression line: to remove the confidence region, simply add se=FALSE to geom_smooth().

The data shows the change over time in tolerance to deviant behavior for a group adolescents in a youth study. I pull the data from UCLA’s web site, which is an invaluable companion for the book. UCLA gives a similar plot using the lattice package, which is included by default in R 2.14. The data is organized in person-period format, which means each person has multiple rows, and each row represents a unique period (age) for each person.

# read data from UCLA's web site
tolerance.pp <- read.table("http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1_pp.txt", sep=",", header=T)

# load ggplot2 library
require(ggplot2)

# plot
ggplot(tolerance.pp,  # data set name
	aes(age, tolerance))  + # values for horizontal and vertical axes
	geom_point() +  # scatter plot
	geom_smooth(method=lm) + # regression line with 95% confidence interval
	facet_wrap(~id) # separately plot each subject by his unique identifier

While I haven’t yet finished reading this rich book, I found this method useful for studying the annual giving patterns of major donors at a non-profit organization. For example, some donors give for years at a low level and suddenly give a large gift, while others gradually increase their annual giving. The method is the same as above except a log coordinate transformation is needed on the Y axis.

About these ads

5 thoughts on “Plotting individual growth charts

  1. Great book indeed. I found that I had little application for the growth modeling aspect (part of the book) because most of the continuous data I deal with are things like revenue $, number of orders/purchases etc from customers – none of which are normally distributed (revenue $ will often have many zeros). So I was interested in your application – did you have this problem with longitudinal analysis of donors?

    • Have you tried a log or square root transformation? When viewing a plot just like this for donors showing revenue over time, I use ggplot’s coordinate transformations. Then the plots are easier to read. I haven’t tried any multilevel modeling yet, but I would try transforming the variables. The problem I am more worried about is after the transformation, the patterns vary. Many are now (after the transformation) linear, a few polynomial, and exponential. The ALDA book admits with three waves it’s hard to make a case for anything except linear, but I can easily get 5-20 years for many donors.

  2. @heuristicandrew Thank you for the post! I just released a similar (and simple) package on CRAN a few weeks back called OLScurve that fits various OLS trajectories to data in a way that that is much little less cumbersome than to do by hand; it uses lattice instead of ggplot2, however. I didn’t have standard errors in the faceted trajectories however, but the next update will thanks to your post. Thanks for the inspiration! Cheers.

  3. Thanks for the information. While I’m waiting for the book from the library, I had a question: Is there a way to have an outer grouping factor to the individual ID? For example, I have growth rates of individuals that have I am grouping into 3 different outcomes. Can you plot all the lines of individuals of a similar outcome in the same plot? In other words, 3 panels with multiple lines within each panel?

    • This code shows how to use the data above to plot all individuals in one plot. Then you would just add a facet_wrap() statement for the variable of the “3 different outcomes.”

      ggplot(tolerance.pp,  aes(age, tolerance, fill=as.factor(id)))  + 
        geom_point() + geom_smooth()
      

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s