Data frames are lists

No Comments

I’ve got another more expansive post in the hopper that I need to edit and post, but here’s a quick one to reiterate a point that you probably already know, but is really important:

Data Frames are lists

This point, that data frames are lists where each column of the data frame is an item (usually a named vector) within the list, is really really useful. Here’s some things this tidbit lets you do:

  1. Grab a single column as a vector with double brackets, ala constituents[[8]]. If you use single brackets, you’ll get a data frame with just that column (because you’re asking for a list with the 8th item in this case).
  2. Do lapply and sapply functions over each column, ie. you can easily loop through a data frame, performing the same function on each column.

    You might think this isn’t a big deal. “Normally the columns of my data frames are all different types of data, first name here, lifetime giving there,” you might say.

    But trust me, it won’t be long til you’ve got a data frame and you think “oh, I’ll just write a little for loop to loop through each column and…” That’s what lapply and/or sapply are for. Knowing that the scaffolding to do that task is already built into one handy function will save your bacon.

  3. Converting a list of vectors to a data frame is as simple as saying as.data.frame(myOriginalList). And while lists can be a bit fiddly, with this one, you know what you’re going to get.

In short, just knowing that a data frame is a special kind of list makes it a lot easier to handle data frames when the need for crazy sorts of data manipulation comes up (and trust me, it will).

Categories: data manipulation Tags: Tags: