The formattable package is AWESOME

No Comments

Two things I’ve really missed from Excel, when I moved to using R, were conditional formatting and being able to format numbers as currency or percentages without losing data.

I wrote an as.currency() function a while ago, but it turns numbers to strings, which means that I can no longer math those strings without coercing them back into numbers. And coercion is always scare. Do you want to lose data? Because that’s how you lose data.

The formattable package solves both these problems.

It’s got a couple functions, percent() and currency() that only affect how numbers are PRINTED, so they come out looking proper even on the terminal, but they stay numeric so you can continue to use them as numbers.

It also has a host conditional formatting options, so you can change the text color of a column based on its values, add data bars (ie. built-in sparklines within the column itself) or format one column based on the values of another (something I’m pretty sure you can do in Excel, but I’ve never been able to figure out how).

Straight out of the vignette (because I want to get back to playing with this stuff, instead of writing this post):

id price rating market_share revenue profit
1 10 5 10.00% 55,000.00 25,300.00
2 15 4 12.00% 36,400.00 11,500.00
3 12 4 5.00% 12,000.00 (8,200.00)
4 8 3 3.00% (25,000.00) (46,000.00)
5 9 4 14.00% 98,100.00 65,000.00

Getting started – use dplyr

No Comments

If you’ve already installed R and RStudio, there’s one more thing you’re going to need before you really get started using R for predictive modeling for fundraising: dplyr.

dplyr is an R package (which is to say “add-on code”) that makes using R for basic data manipulation substantially less painful.

It’s dangerous to go alone. Take this. [offers dplyr].

To get dplyr, fire up R and then type

install.packages('dplyr')

When that finishes, the package will be installed but not loaded, so do

library(dplyr)

Write like you think

dplyr provides two major advantages. The first is that it allows you to write code the way you think, namely starting from a set of data and working towards the final result.

There’s a host of articles online about how great this is, so I’m not going to spell it out for you–suffice it to say, it makes thinking through and solving problems a lot easier.

A grammar of data manipulation

The other thing that dplyr does very well is providing a sort of grammar of data manipulation, specifically a set of verbs that you typically use to solve most common data problems, stuff like sorting, rearranging and renaming columns, adding new calculated columns, etc.

The author of dplyr, the inestimable Hadley Wickham, has a great tutorial on these verbs.

Learning these 5 functions (plus just a couple more like ifelse and grepl, which I’ll cover in later posts) will solve the overwhelming majority of the tedious sorts of data manipulation tasks you’ll find yourself doing every time you start in a data job.

Categories: Getting started Tags: Tags: , ,