# Using Cut to Categorize Wealth Capacity

Occasionally, I want to categorize wealth capacity into different bins. For example, capacity (or giving, or whatever numeric variable) should fit into bins from 1 – 12 (we’ve always used WealthEngine’s categories here in the office, despite the fact that it’s incomprehensibly a 12 point scale instead of 10 and the lowest number is best, instead of worst).

In Excel, I’d turn a number into a categorical variable (or factor, as R calls them) by using a lookup table and a bunch of nested IF statements. You can do the same with ifelse() in R if you want, but I’d recommend using cut() instead.

To start, you’ll want something like this:

1 2 3 4 5 6 |
> cut(df$capacityrating , breaks=c(0,100,500,1000,5000,10000,50000,100000,500000,1000000,Inf) , labels=c("J","I","H","G","F","E","D","C","B","A") , right=FALSE) |

That should get you pretty close to where you want to go. A few notes about what’s going on here.

- You’re going to need specify breaks, ie. the minimum amount for each bin. If we’re talking about giving or wealth capacity, this means 0 and Inf should be the top and bottom of your breaks (the minimum of the worst bin is a capacity of $0; without setting the minimum at the top at infinity, R won’t know what bin to put your very best donors in).
- You’ll need one fewer labels than cut points–nobody has a wealth capacity of less than zero (your bottom break) or greater than infinity (the top break).
- You’re also going to want to specify
`right=F`

. This tells R that you’re giving it the minimum for each bin rather than the maximum (ie. the “left” side instead of the right).

To sum up: , if you’re making a scale with 12 bins, you need 12 labels and 13 break points (with 0 and Inf on either end of that list of breaks).

### Using a lookup table to specify capacity ratings

If you’ve already got your ratings in a table (I’d be willing to bet you’ve got something in a spreadsheet somewhere), you can read that into a data frame (named ratings below) and use it like this:

1 2 3 4 5 6 |
> cut(df$capacityrating , breaks=c(ratings$minAmt, Inf) , labels=ratings$code , right=F) |

The only trick here is that you need to make sure you sort the table so your minimums are in ASCENDING order, so your worst category is at the top of the table and the best, wealthiest, category is at the bottom of the lookup table.

### Using cut() for other categorizations

That’s all there is to using cut to categorize wealth capacity ratings into bins, but you can probably see how to apply this to other numeric columns ( to lifetime giving to come up with rough categorical outcome variables, for example).

I’d probably end up knocking together the lookup table in Excel and then using `read.table(file="clipboard", sep="\t")`

to read it into R from the clipboard.

From there, it’s just a matter of running cut() on the data you want to transform!