Last week, a colleague showed me a trick in excel. And it really bugs me when you can do something in excel that R can not do easily. So I gave it some thought and came up with a solution. Here's the problem:
Sometimes we have data with missing values and want to fill them in a specific way. What if you want to fill them in with the above-nearest non-missing value? An example of such a data set:
You can imagine that it's easy to do this manually for a small data set, but tedious and horrible for larger data sets. The instructions for excel is this:
- Select data.
- Hit F5 (Go To command)
- Click on the 'Special...' button.
- Select the 'Blanks' option, hit ok.
- Now, while every blank is highlighted, type in "=A2", where A2 is your first non-blank value.
- Hit 'Ctrl + Enter' at the same time.
Viola! Done. This can be done even faster with keyboard shortcuts.
In R, we have to be smart. But it can be done easily with a recursive function. See the code below.
##-----Recursive Fill in------ # This script will attempt to do an excel trick with a recursive R function data = data.frame("label"=c("A",NA,"B",NA,NA,"C",NA,"D",NA,NA,NA,NA,"E",NA,"F")) non.na.index = function(x,data){if(is.na(data[x])){non.na.index(x-1,data)}else{data[x]}} new.column = sapply(1:(dim(data)[1]),function(x) non.na.index(x,data$label))
This R code does the following:
- Creates the data and the missing rows are designated as "NA". This is standard for R.
- non.na.index function takes a column and an index. If that value is NA, it calls itself for the index above it.
- The 'sapply' function is used to apply the function over a whole column.
This runs relatively quick and gives us the same column as the excel trick. Neat-o.
PS- If the data comes this way via pivot tables in Excel, remember you can format the pivot tables certain ways to avoid having to do this at all.