Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth of neural networks to model complex, non-linear hypothesis is desirable for many real world problems—including regression—so can they be used for regression? Indeed, and the first example of neural networks in the book “Data Mining Techniques: Second Edition” by Berry and Linoff is estimating the value of a house.
Using standard libraries built into R, this article gives a brief example of regression with neural networks and comparison with multivariate linear regression. The data set is housing data for 506 census tracts of Boston from the 1970 census, and the goal is to predict median value of owner-occupied homes (USD 1000′s).
### ### prepare data ### library(mlbench) data(BostonHousing) # inspect the range which is 1-50 summary(BostonHousing$medv) ## ## model linear regression ## lm.fit <- lm(medv ~ ., data=BostonHousing) lm.predict <- predict(lm.fit) # mean squared error: 21.89483 mean((lm.predict - BostonHousing$medv)^2) plot(BostonHousing$medv, lm.predict, main="Linear regression predictions vs actual", xlab="Actual") ## ## model neural network ## require(nnet) # scale inputs: divide by 50 to get 0-1 range nnet.fit <- nnet(medv/50 ~ ., data=BostonHousing, size=2) # multiply 50 to restore original scale nnet.predict <- predict(nnet.fit)*50 # mean squared error: 16.40581 mean((nnet.predict - BostonHousing$medv)^2) plot(BostonHousing$medv, nnet.predict, main="Neural network predictions vs actual", xlab="Actual")
Now, let’s use the function train() from the package caret to optimize the neural network hyperparameters decay and size, Also, caret performs resampling to give a better estimate of the error. In this case we scale linear regression by the same value, so the error statistics are directly comparable.
> library(mlbench) > data(BostonHousing) > > require(caret) > > mygrid <- expand.grid(.decay=c(0.5, 0.1), .size=c(4,5,6)) > nnetfit <- train(medv/50 ~ ., data=BostonHousing, method="nnet", maxit=1000, tuneGrid=mygrid, trace=F) > print(nnetfit) 506 samples 13 predictors No pre-processing Resampling: Bootstrap (25 reps) Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... Resampling results across tuning parameters: size decay RMSE Rsquared RMSE SD Rsquared SD 4 0.1 0.0852 0.785 0.00863 0.0406 4 0.5 0.0923 0.753 0.00891 0.0436 5 0.1 0.0836 0.792 0.00829 0.0396 5 0.5 0.0899 0.765 0.00858 0.0399 6 0.1 0.0835 0.793 0.00804 0.0318 6 0.5 0.0895 0.768 0.00789 0.0344 RMSE was used to select the optimal model using the smallest value. The final values used for the model were size = 6 and decay = 0.1. > > lmfit <- train(medv/50 ~ ., data=BostonHousing, method="lm") > print(lmfit) 506 samples 13 predictors No pre-processing Resampling: Bootstrap (25 reps) Summary of sample sizes: 506, 506, 506, 506, 506, 506, ... Resampling results RMSE Rsquared RMSE SD Rsquared SD 0.0994 0.703 0.00741 0.0389
A tuned neural network has a RMSE of 0.0835 compared to linear regression’s RMSE of 0.0994.