Recently, I’ve been working on some machine learning projects at work. A lot of these require serious computing time–my work machine isn’t super beefy, but it’s no slouch and still some of these models take a couple hours to build.
Without really thinking about it, I assumed I was using all my processing power. And then, I looked at my Task Manager. And it turns out, of the 4 cores I’ve got, R was only using one!
Getting multi-core support in R
A bit of research revealed that R is really bad at supporting multiple cores. It’s baffling to me, but that’s the way it is. Apparently, there are various solutions to this, but they involve installing/using packages and then making sure your processes are parreleliziable. Sounds like a receipe for disaster if you ask me–I screw enough up on my own, I don’t need to add a layer of complexity on top of that.
An alternative, easier solution is to use Revolution Analytics distribution of R, Open R, which comes with support of multiple cores out of the box.
Just download and install and when you fire up RStudio the next time, it’ll find it, and (probably) start using it (if not, you can go into Global Options in RStudio and call out that you want to use that version of R.
Now my packages won’t update
Open R seems to run just fine, but a couple weeks in, I realized I had a problem–my packages weren’t updating (and I really wanted the newest version of dplyr!).
Turns out, Open R is set to not update packages by default. The idea is that they snapshot all packages each year so things don’t get updated and break half way in.
This doesn’t really bother me–I’m rarely spending more than a couple weeks on a single project,not do I have any massive dependencies that would break between upgrades, so I followed the instructions in the FAQ above to set my package repo back to CRAN (basically, you just need to edit your Rprofile.site.)
And sure enough, I was back in business with dplyr 0.4 (and a host of newly updated packages)!