Short Circuiting Logical Statements in R

Short circuiting logical statements are the way ‘lazy’ programming languages evaluate logical statements.  To identify whether or not the programming language you are using is lazy or eager, reference the chart on the wiki page: http://en.wikipedia.org/wiki/Short-circuit_evaluation

The lazy (and quicker) way of evaluating logical statements is if any of the logicals joined with an ‘AND’ are false, the whole statement is false.  Conversely, if any of the logicals are joined with an ‘OR’ and any are true, the whole statement is true.  Eager programs will evaluate the whole statement before moving on, and lazy programs will move on as soon as one of those conditions are hit.

Python is, by default, a lazy program and has no options for eager evaluation (although you could explicitly program out eagerness in python).  R has both.  Yup, we should have seen this coming being as R has FIVE different assignment operators (http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html). The ways to write them are ‘&’ (AND) and ‘|’ (OR) for eager evaluation methods. ‘&&’ (AND) and ‘||’ (OR) for lazy evaluators.

Mandatory python announcement: The logical operators are simply ‘and’ and ‘or’ (written exactly like that) and are by default lazy.  In my opinion, this is a big advantage that python has over R: Simplicity.

Anyways, back to R.  Let’s implement this to show that these operators do indeed work how they should. Remember that testing for equality between a NULL value and a numeric value will return an error in R.

Eager evaluation:


> if(5!=5 & NULL==5){print("This is true!")}else{print("This is false!")}
Error in if (5 != 5 & NULL == 5) { : argument is of length zero

Lazy evaluation:


> if(5!=5 && NULL==5){print("This is true!")}else{print("This is false!")}
[1] "This is false!"

This should raise an eyebrow for a few reasons. Besides being really cool, it is also dangerous. If I’m testing a program with lazy evaluation, it will not output an error if it ends up comparing NULL values to numeric values. Hence good procedure is to test programs with eager evaluation and then switch to lazy evaluation after testing.

Now, I promised an application to big data. When working on big data and we need to go though and test criteria on a very large data set, it’s fair to say that lazy evaluation will be quicker than eager evaluation. Let’s see by how much.


> test.mat = matrix(rnorm(1000000),nrow=1000)
> system.time(apply(test.mat,1,function(r) sapply(r,function(c) if(c>0.99 & c<0.999){c}else{0}))) user system elapsed 5.27 0.00 5.38 > system.time(apply(test.mat,1,function(r) sapply(r,function(c) if(c>0.99 && c<0.999){c}else{0}))) user system elapsed 4.15 0.00 4.24

On one million elements, testing each one by one for two conditions we saved a whole second! Now let’s up the ante.


> system.time(apply(test.mat,1,function(r) sapply(r,function(c) if(c>0.99 & abs(c**2+3*c-2)<0.999 & sin(tan(abs(pi*c)))>0.5){c}else{0})))
user system elapsed
11.43 0.00 11.50
> system.time(apply(test.mat,1,function(r) sapply(r,function(c) if(c>0.99 && abs(c**2+3*c-2)<0.999 && sin(tan(abs(pi*c)))>0.5){c}else{0})))
user system elapsed
4.90 0.00 4.98

We saved over half the time! The above example should help point out that the easier to evaluate logical statements should come first in evaluation to speed up the calculation.
As a side note, please do NOT use short circuiting logical statements when selecting a subset of data. See the following example as to why.


> data = data.frame("a"=c(1,3,5),"b"=c(2,4,6))
> data
a b
1 1 2
2 3 4
3 5 6
> data.subset = data[data$a==3 && data$b==4,]
> print(data.subset)
[1] a b
<0 rows> (or 0-length row.names)
> data.subset = data[data$a==3 & data$b==4,]
> print(data.subset)
a b
2 3 4

You can see that using the short circuiting logical statements reduce our data set down to nothing. Hence, we need to use eager logical statements when sub-setting data.

To summarize, we use lazy evaluation to make programs run quicker with larger data sets and we use eager evaluation to test programs and for sub-setting data.

 

This entry was posted in analysis, data, R and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *