Plot ROC curve and lift chart in R

This tutorial with real R code demonstrates how to create a predictive model using cforest (Breiman’s random forests) from the package party, evaluate the predictive model on a separate set of data, and then plot the performance using ROC curves and a lift chart. These charts are useful for evaluating model performance in data mining and machine learning.

```# You only need to install packages once per machine
# (plus maybe after upgrading R), but otherwise they persist across R sessions.
install.packages('party')
install.packages('ROCR')

# Load the kyphosis data set.
require(rpart)

# Split randomly
x <- kyphosis[sample(1:nrow(kyphosis), nrow(kyphosis), replace = F),]
x.train <- kyphosis[1:floor(nrow(x)*.75), ]
x.evaluate <- kyphosis[(floor(nrow(x)*.75)+1):nrow(x), ]

# Create a model using "random forest and bagging ensemble algorithms
# utilizing conditional inference trees."
require(party)
x.model <- cforest(Kyphosis ~ Age + Number + Start, data=x.train,
control = cforest_unbiased(mtry = 3))

# Alternatively, use "recursive partitioning [...] in a conditional
# inference framework."
# x.model <- ctree(Kyphosis ~ Age + Number + Start, data=x.train)

# ctree plots nicely (but cforest doesn"t plot)
# plot (x.model)

# Use the model to predict the evaluation.
x.evaluate\$prediction <- predict(x.model, newdata=x.evaluate)

# Calculate the overall accuracy.
x.evaluate\$correct <- x.evaluate\$prediction == x.evaluate\$Kyphosis
print(paste("% of predicted classifications correct", mean(x.evaluate\$correct)))

# Extract the class probabilities.
x.evaluate\$probabilities <- 1- unlist(treeresponse(x.model,
newdata=x.evaluate), use.names=F)[seq(1,nrow(x.evaluate)*2,2)]

# Plot the performance of the model applied to the evaluation set as
# an ROC curve.
require(ROCR)
pred <- prediction(x.evaluate\$probabilities, x.evaluate\$Kyphosis)
perf <- performance(pred,"tpr","fpr")
plot(perf, main="ROC curve", colorize=T)

# And then a lift chart
perf <- performance(pred,"lift","rpp")
plot(perf, main="lift curve", colorize=T)
```

This tutorial was tested on Linux and Windows with R 2.9.

Here are some exercises for the reader:

1. Why use mtry= 3? Compare different values, or take out the control = ....
2. Output the results to PDF for printing.
3. Try ctree instead of cforest. Which is better?
4. Replace cforest with other classifiers: rpart, randomForest, or svm (e1071).
5. Use 10-fold cross-validation instead of the simple splitting (though the party packages have cross-validation ‘built in.’).
6. Combine two performance curves (for two different classifiers or settings) in one plot.

For a similar but more detailed tutorial, read “Guide to Credit Scoring in R” by Dhruv Sharma.

If this programming is too much for you, try rattle (a GUI interface to R for data mining) or Weka (a machine learning suite). Otherwise, go on to the next tutorial: Compare performance of machine learning classifiers in R.

6 thoughts on “Plot ROC curve and lift chart in R”

1. Suri says:

This is a great article, thanks for the post.

Please help me understand the difference. I thought lift chart is plotted % of responses vs % of sample size.
Why is lift curve plotted as Lift Value vs RPP ?

• hxy0135 says:

Thank you for the article. It is very helpful!

2. madhu g says:

Hi Andrew,
Thank you very much for such a great article.
I plotted lift chart for one my project using R. But I dont know how to get the table that is used for the plot.
Please let me know, how to get decile wise lift table using R?