Benchmarking R, Revolution R, and HyperThreading for data mining

Usually data mining benchmarks measure lift, precision, etc., but wasting analyst time hurts the ROI of any project. I recently upgraded my notebook (where I often use R for data mining) and was faced with two questions: for the fastest speed for building models, do I use the R or Revolution R, and do I enable Hyper-Threading?

Revolution Analytics provides Revolution R, an enhanced version of R; support; training; etc. Revolution R is allegedly faster for certain workloads, but I’m skeptical—aren’t all analysts?. It doesn’t help that the site reads, “up-to-date” but Revolution R 4.3 is based on R 2.12 even though R 2.13 is out, and it claims, “You do not need to modify your existing R code,” which we’ll see is literally true, though code changes for byte-compilation and explicit parametrization may help too. Their benchmarks look great for matrix multiplication, but that doesn’t necessarily mean faster machine learning. In any case, I’d rather see for myself.

My new CPU (like many others) supports Hyper-Threading Technology (HTT) which basically fakes CPU cores: this is often faster for multi-threaded apps, but the data mining algorithms I run are single threaded. I could run them in parallel with some complication, but my combination of algorithms and large data sets typically consume too much RAM. (Sometimes I run R on Amazon Elastic Cloud with memory compression to get more memory.)

For data mining, most of the time I spend in R is training classification models with party::ctree() and earth::earth().

Benchmark environment

  • Notebook: Dell Latitude E6520
  • CPU: Intel Core i5-2540M dual-core CPU at 2.6GHz with 3M cache (acts like 4-core with HTT)
  • Windows 7 64-bit Enterprise
  • R version 2.13.0 64-bit
  • Revolution R Community version 4.3 64-bit based on R version 2.12.2

Benchmark source code

The code and comments show how to benchmark data mining algorithms in R:

# print system information
R.version
Sys.info()

# install non-core packages
install.packages(c('party', 'rbenchmark', 'earth'))

# load packages
require(rbenchmark)
require(party)
require(earth)
require(rpart)
require(compiler) # R byte code compiler requires R 2.13.0

# function from http://dirk.eddelbuettel.com/blog/2011/04/12/
k <- function(n, x=1) for (i in 1:n) x=1/{1+x}
lk <- cmpfun(k)

# prepare data set from UCI Repository
# see: http://archive.ics.uci.edu/ml/datasets/Credit+Approval
url="http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data"
mydata<-read.csv(url, header=F)

# run benchmark
benchmark(ctree=ctree(V16 ~ .,data=mydata), 
	earth=earth(V16 ~ ., data=mydata),
	rpart=rpart(V16 ~ ., data=mydata),
	k(1e6, 1),
	lk(1e6, 1), # comment out this line on Revolution R or any R before 2.13.0
	replications=20
	)

Benchmark results

The usual caveat for benchmarks applies: your mileage may vary with your hardware, operating system, R version, algorithms, and even the data.

First, the results as a table. The values are the time elapsed in seconds:

R with HTT R without HTT Revolution with HTT Revolution without HTT
ctree 296.23 293.43 187.45 187.89
earth 130.23 134.57 130.03 129.90
k 15.54 15.54 15.38 15.60
lk 4.19 4.10 n/a n/a
rpart 0.50 0.49 0.45 0.47

Now, as a graph:

Chart of R benchmark

Conclusion

Disabling Hyper-Threading has negligible affect on performance. Also Revolution R made little difference for earth() or rpart(), but for party::ctree(), Revolution finished 37% faster.

Revolution R 4.3 is much slower on the synthetic lk() because it does not support byte compilation, but the next release should. Not shown here are the results that byte compiling party::ctree() and earth() has little effect—probably because the heavy lifting is done in C++ code. Explicit parallelization probably would have helped even more.

I already used Revolution R on Amazon EC2, and for now, I plan to switch to Revolution R on Windows too. I’ll leave HTT enabled in case it helps other tasks.

12 thoughts on “Benchmarking R, Revolution R, and HyperThreading for data mining

  1. Thank you for making the comparison.
    I would say that I am not surprised of the results you’ve got for rpart, since the main engine behind it is done by calling .Primitive (I think C code, but it needs to be checked).

    Cheers,
    Tal

  2. Great benchmark. Would be nice to extend your benchmark on different operating systems. I would expect that there are differences between Linux and Windows. Furthermore, would be great to have a time measurement for R in the cloud.

    If you are looking for a simple access to cloud computing you should try cloudnumbers.com. cloudnumbers.com provides researchers and companies with the resources to perform high performance calculations in the cloud. High performance computing so far includes serial processing with different main memory sizes and high efficient computer clusters with up to 128 instances. We currently focus on the well-known open-source statistics program R (http://www.r-project.org), but further applications are coming soon.

  3. nice benchmark, thank you.

    Markus makes the point on different Os, and another one way to test could test the behaviour with different amount of RAM and alternative (dataset-sizes) over available RAM.

  4. Great bench, dude. I have just one question.

    I have 2 physical cores and 4 with hyperthreading. How do you use hyperthreading within revolution R? I can only manage to use 50% of my cpu with the function setMKLthreads(2) (above 2 it’s impossible)

    • That may just be a limitation of the way you measure CPU usage. To the operating system and to R, it mostly doesn’t matter whether a CPU is stand-alone (like they often were about 2005), a core, hyperthreaded, or virtual (in a virtual machine). What matters to R is that it uses at least one process per apparent CPU, and that worker has enough work to keep the CPU busy.

      • Interesting. I’m trying different number of cores to work around a problem. I mostly use R – multicore – doMC – foreach to parallellize loops over en Intel i7 processor with 4 cores hyperthreaded to 8. I started by just automatic init that set the cores to 8 as seen with hyperthreading.
        This runs into an intermittent problem with RAM. Most of the time the process just consumes 50% ram and fluctuates around that but now and then, it ramps up tp 100% RAM and dies the swap death. There is no difference that I can see between the runs that finish perfectly and the ones that die the swap death. It is completely stochastic to me. One remedy has been to reduce the cores to the number of physical cores (4). Now I just wonder if I need to turn of the hyperthreading to just have 4 logical cores or if my 4 workers on 4 physical but 8 logical cores are as fast. Any of your ideas on this would be helpful.

      • With some R functions I’ve had a similar problem with using all my Intel i5 cores because of the RAM bottleneck (although I didn’t have the randomness). Based on my benchmark here, HT has a negligible effect on performance, but the best way to find out is to benchmark your hardware using the same R functions—preferably with the same data.

        Other things to keep in mind: RAM is relatively cheap, so make sure you have the maximum amount that fits in your machine. Also, developer time is expensive, so if there is a small difference between HT and not, disable it, so you avoid the wasted time of dealing with the RAM issue.

  5. Pingback: Anonymous

  6. Pingback: R computing: AMD or Intel? [closed] | Stackforum.com

  7. Pingback: Benchmarking R/RRO in OSX and Ubuntu on the cloud | Numbr Crunch

  8. Pingback: Test Post | Numbr Crunch

Leave a comment