More and more I'm using the 'apply' functions in R (apply, mapply, sapply, lappy,...). The help functions on these are hard to decipher. Not till I read a post by Neil Saunders, did I really start using them instead of loops.
Lately, I've been creating more nested-apply functions, i.e., an apply function within an apply function. The latest nested apply function I created does something really neat with specific data. What it does is create a list of indices in a sparse matrix down columns of blocks of numbers. Let's look at a simple example.
Let's say here is our raw data:
What we are looking to create in R is a list of the block indices of non-zero numbers by column. Here is the function with it's output:
The output we seek is a list containing the indices for each non-zero block. For example we want the output for column A to look like:
A[] = 1 2 3 (because the first three positions in column A are non-zero and compromise one block of numbers.
Column B has two blocks, so the output should be:
B[] = 1 2
B[] = 5 6
Here's the nested apply function to create this list (lapply nested inside an apply on the columns).
Let's break it down step by step. The first thing is to find the non-zero indices. (Let's perform this on the second column, B)
Remember R reads down columns. Now let's find where non-consecutive differences occur (to separate the blocks).
Notice, wherever there are non-ones is where the break occurs. But because of how the 'diff' function works, we need to add a position in front. Let's also convert them to logicals.
If we do a cumulative sum of these logicals (remember FALSE = 0, and TRUE = 1) we will have separated the blocks with integers starting at 1.
If we add one and find the unique ones, we get:
Now let's wrap it up in a 'lapply' function to return a list for column B:
Now we just stick it in an apply function across the columns to find the indices of blocks of non-zero numbers. (Or across rows if you want.)
Sometimes I get so proud of my nested apply functions... sniff... sniff. One day I hope to write a triple nested apply function. I hope someone finds this useful.