# Comparing Dr Who episodes by decade

A little while ago, io9 rated every Dr. Who episode from best to worst.

I immediately noticed that a bunch of their favorites were from the reboot, despite the fact that there’s a lot more content in the older series. So I decided to pull the data into R to see if I was imagining things. I know this isn’t fundraising-related, but it IS R related, and it was a fun project to work on over lunch.

Here’s a plot of all the episodes with year on the x-axis and rank on the y-axis. Remember that higher rank is worse.

1 2 |
ggplot(io9, aes(year,rank)) + geom_point() + geom_smooth() |

1 2 |
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method. |

It definitely LOOKS like the new stuff is better, but I’ll bet we can know more.

Here’s what I found, grouping all the episodes by rounding the year to the nearest decade (ie. 1953 becomes 1950; 1958 becomes 1960):

### Let’s Look at Averages

In terms of mean rank, the reboot was WAY better than everything (and the 1990s stuff had an overall average rank that was much worse):

1 2 3 4 |
io9 %>% group_by(roundedyr) %>% summarize(`avg rank` = mean(rank) ) |

1 2 3 4 5 6 7 8 9 10 11 |
## Source: local data frame [6 x 2] ## ## roundedyr avg rank ## (dbl) (dbl) ## 1 1960 155.10526 ## 2 1970 149.67273 ## 3 1980 140.65672 ## 4 1990 175.62500 ## 5 2000 69.75000 ## 6 2010 95.70588 |

You can see there that, on average, the 2000 decades were the best and the 1990s were the worst.

### T-test to really see

But this is just averages–seems like we can do better. I mean, there could be an outlier sitting out there yanking those averages down (or up, as it were).

With that in mind, I ran a t.test to compare each year to another:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
roundedyrs <- unique(io9$roundedyr)[order(unique(io9$roundedyr))] tvalsroundedyrlong <- lapply(roundedyrs, function(i) { data.frame(roundedyrs, pval = sapply(roundedyrs, function(x) { results <- t.test(io9$rank[io9$roundedyr == x], io9$rank[io9$roundedyr == i]) results$p.value }) %>% unlist, comparedto = i ) }) %>% rbind_all tvalsroundedyrlong %>% mutate(pvalok = pval <= .05) %>% select(-pval) %>% spread(comparedto,pvalok) |

1 2 3 4 5 6 7 8 9 10 11 |
## Source: local data frame [6 x 7] ## ## roundedyrs 1960 1970 1980 1990 2000 2010 ## (dbl) (lgl) (lgl) (lgl) (lgl) (lgl) (lgl) ## 1 1960 FALSE FALSE FALSE FALSE TRUE TRUE ## 2 1970 FALSE FALSE FALSE FALSE TRUE TRUE ## 3 1980 FALSE FALSE FALSE FALSE TRUE TRUE ## 4 1990 FALSE FALSE FALSE FALSE TRUE TRUE ## 5 2000 TRUE TRUE TRUE TRUE FALSE FALSE ## 6 2010 TRUE TRUE TRUE TRUE FALSE FALSE |

Here I’m asking whether the p-values are less than .05. Or to put it another way, is there less than a 5% chance that the difference we’re seeing in the averages for each decade occurred by chance? TRUE values are where there is a smaller than 5% chance that the difference is statistically signficant.

### Old Stuff Was A Crapshoot

The other thing you can see is that the old stuff was all over the map. You can see this from the plot above, but there’s not a statistically significant difference in ranks between the 60s, 70s, 80s and 90s. I found that surprising–everybody knows that stuff at the very end of the first series was awful, right? Not io9, apparently.

In any case, if you’re introducing your friends to Dr. Who, start with the reboot–that old stuff is like a box of chocolates….