Plot ROC curve and lift chart in R

This tutorial with real R code demonstrates how to create a predictive model using cforest (Breiman’s random forests) from the package party, evaluate the predictive model on a separate set of data, and then plot the performance using ROC curves and a lift chart. These charts are useful for evaluating model performance in data mining and machine learning.

# You only need to install packages once per machine
# (plus maybe after upgrading R), but otherwise they persist across R sessions.

# Load the kyphosis data set.

# Split randomly
x <- kyphosis[sample(1:nrow(kyphosis), nrow(kyphosis), replace = F),]
x.train <- kyphosis[1:floor(nrow(x)*.75), ]
x.evaluate <- kyphosis[(floor(nrow(x)*.75)+1):nrow(x), ]

# Create a model using "random forest and bagging ensemble algorithms
# utilizing conditional inference trees."
x.model <- cforest(Kyphosis ~ Age + Number + Start, data=x.train,
control = cforest_unbiased(mtry = 3))

# Alternatively, use "recursive partitioning [...] in a conditional
# inference framework."
# x.model <- ctree(Kyphosis ~ Age + Number + Start, data=x.train)

# ctree plots nicely (but cforest doesn"t plot)
# plot (x.model)

# Use the model to predict the evaluation.
x.evaluate$prediction <- predict(x.model, newdata=x.evaluate)

# Calculate the overall accuracy.
x.evaluate$correct <- x.evaluate$prediction == x.evaluate$Kyphosis
print(paste("% of predicted classifications correct", mean(x.evaluate$correct)))

# Extract the class probabilities.
x.evaluate$probabilities <- 1- unlist(treeresponse(x.model,
newdata=x.evaluate), use.names=F)[seq(1,nrow(x.evaluate)*2,2)]

# Plot the performance of the model applied to the evaluation set as
# an ROC curve.
pred <- prediction(x.evaluate$probabilities, x.evaluate$Kyphosis)
perf <- performance(pred,"tpr","fpr")
plot(perf, main="ROC curve", colorize=T)

# And then a lift chart
perf <- performance(pred,"lift","rpp")
plot(perf, main="lift curve", colorize=T)

This tutorial was tested on Linux and Windows with R 2.9.

Here are some exercises for the reader:

  1. Why use mtry= 3? Compare different values, or take out the control = ....
  2. Output the results to PDF for printing.
  3. Try ctree instead of cforest. Which is better?
  4. Replace cforest with other classifiers: rpart, randomForest, or svm (e1071).
  5. Use 10-fold cross-validation instead of the simple splitting (though the party packages have cross-validation ‘built in.’).
  6. Combine two performance curves (for two different classifiers or settings) in one plot.

For a similar but more detailed tutorial, read “Guide to Credit Scoring in R” by Dhruv Sharma.

If this programming is too much for you, try rattle (a GUI interface to R for data mining) or Weka (a machine learning suite). Otherwise, go on to the next tutorial: Compare performance of machine learning classifiers in R.

6 thoughts on “Plot ROC curve and lift chart in R

  1. Pingback: Compare performance of machine learning classifiers in R « Heuristic Andrew

  2. Pingback: Identifing Potential Customers with Classification Techniques in R Language | Data Apple

  3. This is a great article, thanks for the post.

    Please help me understand the difference. I thought lift chart is plotted % of responses vs % of sample size.
    Why is lift curve plotted as Lift Value vs RPP ?

  4. Hi Andrew,
    Thank you very much for such a great article.
    I plotted lift chart for one my project using R. But I dont know how to get the table that is used for the plot.
    Please let me know, how to get decile wise lift table using R?

    Thanks in advance.

  5. you could set seed and use stratified sampling with caTools::sample.split instead of the default split function for a better train/test split-ratio:

    x.split<-sample.split(kyphosis$Kyphosis, SplitRatio=0.75)
    x.evaluate <-kyphosis[x.split==FALSE,]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s