Train neural network in R, predict in SAS

This R code fits an artificial neural network in R and generates Base SAS code, so new records can be scored entirely in Base SAS.

This is intended to be a simple, elegant, fast solution. You don’t need SAS Enterprise Miner, IML, or any other special licenses, and R is free. You don’t need PMML. Unlike methods which accomplish a similar goal, SAS never invokes R, and the two systems never pass records between each other.

This code uses the nnet package which supports a single hidden hidden layer. It supports R factors (nominal variables) and networks with an arbitrary number of units in the input and hidden layers but only one output unit. With a little work, you could support multiple output units and other target programming languages such as C, Java, or Python. Skip connections are not supported.

The first piece of code includes the metaprogramming function and a demonstration of its use.

# Copyright (C) 2011 Andrew Ziem
# Licensed under the GNU General Public License version 2 
# or later <https://www.gnu.org/licenses/gpl-2.0.html>
 

###
### prepare demonstration data
###

require(earth) # for etitantic data
data(etitanic)
mydata <- etitanic
mydata$survived <- as.factor(ifelse(etitanic$survived==1, 'T', 'F'))
summary(mydata)



###
### fit a neural network
###

require(nnet)
fit <- nnet(survived ~ ., data=mydata, size=2, maxit=1000)


###
### inspect the fitted model
###


fit
summary(fit)
str(fit)
fit$coefnames
fit$wts


###
### metaprogram SAS code
###

# notation:
# zji refers to z^j_i where j is the layer and i is the unit
# aji refers to a^j_i is the activation of g(z^j_i) where g is the sigmoid function

nnet2sas <- function(fit)
{
	sas <- paste("/* neural network size", paste(fit$n, collapse='-'), "*/\n")
	sas <- paste(sas, '/* inputs: ', paste(fit$coefnames, collapse=' '), ' */\n', sep='')
	sas <- paste(sas, '/* this macro handles extreme values */\n')
	sas <- paste(sas, '%macro logistic(z);\n')
	sas <- paste(sas, '1/(1+exp(min(max(-(&z),-500),500)))\n')
	sas <- paste(sas, '%mend;\n')


	# Define the input layer.
	# If there are factors, then in SAS you will have to manually change
	# something like 'pclass2nd' to 'pclass eq "2nd"'.
	# Also, this is the place to apply a range transformation (if any).
	
	for (input in 1:fit$n[1]){
		sas <- paste(sas, 'i', input,' = ',fit$coefnames[input],';\n',sep='')
	}
	

	# compute the hidden layer from the input layer
	for (h in 1:fit$n[2]) {
		unit.offset <- (fit$n[1]+1)*(h-1)+1
		z2 <- c()
		# bias unit (intercept)
		z2[1] <- paste('z2',h,' = ',fit$wts[unit.offset],sep='')
		# loop through input layer
		for (input in 1:fit$n[1]){
			z2[input+1] <- paste('(',fit$wts[unit.offset+input],' * i', input, ')', sep='')
		}
		sas <- paste(sas, paste(z2, collapse='+'),';\n',sep='')
		sas <- paste(sas, 'a2',h," = %logistic(z2",h,");\n", sep='')
	}

	# compute the output layer from the hidden layer
	output.offset <- (fit$n[1]+1)*(fit$n[2])
	z3<-c()
	# bias unit
	z3[1] <- paste('z31 = ',fit$wts[output.offset+1],sep='')
	# loop through the hidden layer
	for (h in 1:fit$n[2]) {
		z3[h+1] <- paste('(',fit$wts[output.offset + h + 1],' * a2', h, ')', sep='')
	}
	
	sas <- paste(sas, paste(z3, collapse='+'),';\n',sep='')
	sas <- paste(sas, "o = %logistic(z31);\n", sep='')

	# clean up temporary SAS variables
	sas <- paste(sas,
		paste('drop ',paste('i', 1:fit$n[1], collapse=' ', sep=''), ' ',
		paste('z2', 1:fit$n[2], collapse=' ', sep=''),' ',
		paste('a2', 1:fit$n[2], collapse=' ', sep=''),
		';\n', sep=''))

	
	return(sas)
}

# This is how to invoke the metaprogramming function with
# the fitted neural network as the input.
sascode <- nnet2sas(fit)

# Print SAS code to the R console.  In this case, you need something
# like NotePad++ to replace the \n line break with a real line break.
# Then you can paste it into SAS.
print(sascode)

# Alternatively, this automatically copies the code to the clipboard, 
# and then it's up to you to paste in the SAS editor. 
writeClipboard(sascode) 



###
### Calculate predictions in R for comparison with SAS to verify accuracy.
###

mydata$prediction <- predict(fit, mydata, type="raw")


###
### For demonstration and error checking, 
### transfer the titanic data set from R to SAS.
###

write.csv(mydata, "etitanic.csv", row.names=F, quote=F)

Now this SAS code shows the final product.

proc import
	dbms=csv
	datafile="etitanic.csv"
	replace
	out=etitanic;
run;


data score;
	set etitanic;

	/* translate character variables to binary features */
	pclass2nd = pclass eq '2nd' ;
	pclass3rd = pclass eq '3rd';
	sexmale = sex eq 'male';

	/* neural network size 6-2-1 */
	/* inputs: pclass2nd pclass3rd sexmale age sibsp parch */
	/* this macro handles extreme values */
	%macro logistic(z);
	1/(1+exp(min(max(-(&z),-500),500)))
	%mend;
	i1 = pclass2nd;
	i2 = pclass3rd;
	i3 = sexmale;
	i4 = age;
	i5 = sibsp;
	i6 = parch;
	z21 = -16.9776652947204+(14.7026169229765 * i1)+(16.7606260909055 * i2)+(16.9151745267051 * i3)+(0.0306285658451077 * i4)+(0.313646974183414 * i5)+(-0.00727195086344083 * i6);
	a21 = %logistic(z21);
	z22 = -17.2415233909577+(-5.31218427149218 * i1)+(-11.2637030335919 * i2)+(38.972433341002 * i3)+(-1.45864469788838 * i4)+(-4.67854483939547 * i5)+(3.56308489203666 * i6);
	a22 = %logistic(z22);
	z31 = 3.33062067376837+(-5.23160961746755 * a21)+(4.10482439694368 * a22);
	o = %logistic(z31);
	drop i1 i2 i3 i4 i5 i6 z21 z22 a21 a22;

	/* verify SAS's calculation is close to R's calculation */
	error = o - prediction;
run;

Depending on how your nominal variables are encoded, you would need to treat them different ways with a little manual work which is beyond the scope of the metaprogramming. One simple solution is to encode them as binary features in the data set: this is what nnet() will do anyway. A second scenario, the one we have in the example above, is R encodes pclass as a factor, while SAS has it as a character. In the solution above SAS recodes pclass to the binary features pclass2nd and pclass3rd. If SAS encodes the nominal variable as a number with a format, then the solution would be very similar.

Did you notice the magic number 500? For more information about the logistic macro function, see Logistic function macro for SAS.

Why stop with a neural network? Make an ensemble by doing the same with decision trees: Model decision tree in R, score in Base SAS.

UPDATE October 2012: See nnet2sas() supports centering and scaling for nnet2sas() version 2.

About these ads

One thought on “Train neural network in R, predict in SAS

  1. Pingback: A geek with a hat » I suck at implementing neural networks in octave

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s