The formattable package is AWESOME

No Comments

Two things I’ve really missed from Excel, when I moved to using R, were conditional formatting and being able to format numbers as currency or percentages without losing data.

I wrote an as.currency() function a while ago, but it turns numbers to strings, which means that I can no longer math those strings without coercing them back into numbers. And coercion is always scare. Do you want to lose data? Because that’s how you lose data.

The formattable package solves both these problems.

It’s got a couple functions, percent() and currency() that only affect how numbers are PRINTED, so they come out looking proper even on the terminal, but they stay numeric so you can continue to use them as numbers.

It also has a host conditional formatting options, so you can change the text color of a column based on its values, add data bars (ie. built-in sparklines within the column itself) or format one column based on the values of another (something I’m pretty sure you can do in Excel, but I’ve never been able to figure out how).

Straight out of the vignette (because I want to get back to playing with this stuff, instead of writing this post):

id price rating market_share revenue profit
1 10 5 10.00% 55,000.00 25,300.00
2 15 4 12.00% 36,400.00 11,500.00
3 12 4 5.00% 12,000.00 (8,200.00)
4 8 3 3.00% (25,000.00) (46,000.00)
5 9 4 14.00% 98,100.00 65,000.00

New library for view dataframes: javascript datatables

No Comments

Here’s something that might turn out to be really cool:

RStudio has released a package to integrate R data with the javascript datatables library.

Datatables, not to be confused with the R package by the same name, is a great way to easily make really usable tables online–you get sorting, filtering, pagination all for free without having to write a bunch of nested table/tr/td tags.

Installing this library lets you quickly turn your dataframe into a sortable, filterable table you can really play with. Truth be told, I often dump my data into Excel just before I report on it, for this sort of thing–often I can spot errors more quickly when I can click to resort/filter/etc.

Using datatables to do that sort of filtering might provide a handy alternative to printing data to the console or RStudio’s View() function.

Getting and using DT

Installing the DT library is pretty easy:

From there, you datatable-erize any dataframe with a simple:

Assuming you’re using RStudio, the new data should open in the Viewer pane, giving you something like this:



The one caveat to be aware of is that using this on big data frames is a bad idea–I tried it on our constituent data (80K rows, 200 columns) and it effectively locked up R and RStudio.

So don’t do that.

For smaller data frames, it’s just fine. I have vague plans to wrap a function around this to get some different defaults (mostly to dump the overly large padding and the serif font).

In any case, enjoy the Christmas present–fun libraries to play with!

Categories: Functions Tags: Tags: ,

Find column names in R with grep

No Comments

About half the time, when I’m working in R, I’m querying against a denormalized dump of data from our system of record. (If I was a real rockstar, I’d be querying against the database itself, but I’m not because of reasons.)

The worst part about this is that the column names are generally a wreck, a mix of ugly SQL names and overly pretty readable names. And since we’ve flattened the data, there’s a host of calculated columns with names like “AMT_MAX_GIFT_MAX_DT”. Which is hard to get exactly right for all 200 variables.

I want names!

Tl;dr I can never remember what half the names of these columns are. And because R abbreviates the output of str(), I can’t see them in the RStudio sidebar, either. Even if I could, looking through 200 variables would be a colossal pain, so I devised a way to solve that problem.

My grepnames() function makes it easy to find column names in R.

##Grepnames() Function

Using grepnames()

You use grepnames() like you would grep: you pass it a regular expression and a dataframe, and it returns a dataframe with column names that match the regular expression and their respective column indexes. Something like this:

This isn’t much different than doing grep("foo" names(df)), but it’s less typing and if you mistype, you won’t end up locking up R. Also, the output is slightly more informative.

By default, it’s not case sensitive – I’m working on the assumption that there’s no telling what a column is named, so trying to get the case right would just be a pain. Plus, you’re rarely doing complicated regular expressions – most often I end up passing it “donor” because I can’t remember how the donor code column is titled.

An R package with grepnames()

This function is part of my muadc package for R, which is on github. It’s mostly an assortment of convenience functions, stuff I find myself doing over and over and so wrote functions for. If you have the developer tools package installed, you can install it by doing install_github("crazybilly/muadc").

There’s a couple function which will be useless for you (they’re specific to our office), but a few of them, like grepnames() are pretty handy.

My eventual plan is to build out a full package for higher education fundraising (with a sample data set and some legit documentation) and submit it to CRAN, but I’ll need a bit more time to make that happen.

Until then, happy grepping!