Bind_rows() instead of rbind_all() is awesome

1 Comment

I use dplyr’s rbind_all() all the time to mash multiple data frames together–most of the time, I’ll pull multiple data sets, but then stack them on top of each other to do some sort of summary, something like:

Today, I happened to look at the documentation for rbind_all() and found out it’s been deprecated in favor of bind_rows().

Ok whatever, I thought. Then I happened to look at the arguments for bind_rows( ... , .id = NULL).

Here’s the super cool part: you pass .id a string that becomes the name of a new column in the new data frame. And that column is populated by either a sequence of numbers (in whatever order you passed the data frames you’re binding together) or (and here’s where the real money at), you can name your data frames in the first place. Like this:

Normally, I end up adding a source column or some such each of the smaller individual data frames before I bind them all together. It’s a pain and a good way to screw something up–I often forget to add the column till it’s too late and then I have to go back, make the edit and re-run the code. This solves the problem, and rather elegantly so.