Weighting model fit with ctree in party

Conditional inference trees (ctree) in package party allows weighting which is useful when one classification outcome is more important than another. Useful examples are not difficult to imagine: in a marketing direct mailing, a false positive (non-response) costs just paper and postage (say, $0.50) while a true positive (response) may be worth $100.00. In a medical diagnostic test, a false negative screening for cancer could be fatal.

The following code demonstrates how to use weighting with ctree in party in R 2.10.

# load the mlbench package which has the BreastCancer data set
require(mlbench)

# if you don't have any required package, use the install.packages() command
# load the data set
data(BreastCancer)

# remove the unique identifier, which is useless and would confuse the machine learning algorithms
BreastCancer$Id <- NULL

# partition the data set for 80% training and 20% evaluation (adapted from ?randomForest)
set.seed(2)
ind <- sample(2, nrow(BreastCancer), replace = TRUE, prob=c(0.8, 0.2))

# model using ctree without weights
require(party)
x.ct <- ctree(Class ~ ., data=BreastCancer[ind == 1,])
x.ct.pred <- predict(x.ct, newdata=BreastCancer[ind == 2,])
x.ct.prob <-  1- unlist(treeresponse(x.ct, BreastCancer[ind == 2,]), use.names=F)[seq(1,nrow(BreastCancer[ind == 2,])*2,2)]

# model using ctree with weights 1:10 (benign:malignant)
x.ctw <- ctree(Class ~ ., data=BreastCancer[ind == 1,], weights= ifelse(BreastCancer[ind == 1,]$Class=='benign', 1, 10))
x.ctw.pred <- predict(x.ctw, newdata=BreastCancer[ind == 2,])
x.ctw.prob <-  1- unlist(treeresponse(x.ctw, BreastCancer[ind == 2,]), use.names=F)[seq(1,nrow(BreastCancer[ind == 2,])*2,2)]

# model using ctree with weights 10:1 (benign:malignant)
x.ctw2 <- ctree(Class ~ ., data=BreastCancer[ind == 1,], weights= ifelse(BreastCancer[ind == 1,]$Class=='benign', 10, 1))
x.ctw2.pred <- predict(x.ctw2, newdata=BreastCancer[ind == 2,])
x.ctw2.prob <-  1- unlist(treeresponse(x.ctw2, BreastCancer[ind == 2,]), use.names=F)[seq(1,nrow(BreastCancer[ind == 2,])*2,2)]


# Output the plot to a PNG file for display on web.  To draw to the screen,
# comment this line out.
png(filename="roc_curve_weights.png", width=700, height=700)

# plot performance
require(ROCR)

# create an ROCR prediction object from probabilities
x.ct.prob.rocr <- prediction(x.ct.prob, BreastCancer[ind == 2,'Class'])
# prepare an ROCR performance object for ROC curve (tpr=true positive rate, fpr=false positive rate)
x.ct.perf <- performance(x.ct.prob.rocr, "tpr","fpr")
# plot it
plot(x.ct.perf, col=2, main="ROC curves comparing classification performance of ctree weighted vs unweighted")

# Draw a legend.
legend(0.6, 0.6, c('unweighted', 'weighted 1:10 (benign:malignant)', 'weighted 10:1'), 2:4)

# weighted 1:10
x.ctw.prob.rocr <- prediction(x.ctw.prob, BreastCancer[ind == 2,'Class'])
x.ctw.perf <- performance(x.ctw.prob.rocr, "tpr","fpr")
# add=TRUE draws on the existing chart
plot(x.ctw.perf, col=3, add=TRUE)

# weighted 10:1
x.ctw2.prob.rocr <- prediction(x.ctw2.prob, BreastCancer[ind == 2,'Class'])
x.ctw2.perf <- performance(x.ctw2.prob.rocr, "tpr","fpr")
# add=TRUE draws on the existing chart
plot(x.ctw2.perf, col=4, add=TRUE)

# close and save PNG
dev.off()

This ROC curve compares the performance of the decision tree model unweighted, weighted 1:10, and weighted 10:1.

To use weights with cforest (random forests), simply search and replace ctree with cforest: all the other parameters and code are identical.

About these ads

One thought on “Weighting model fit with ctree in party

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s