Fast Data Frame Modification in R

One of the frequent data structures I use in R is a data frame. Data frames are similar to matrices except they allow different types of variables in each column. I always rejoice when I can reduce the analysis at hand to a matrix of numbers. R is inherently faster at dealing with matrices than data frames. When I get data that is millions of rows long and over a hundred columns big, R still works, but slows down significantly. Lately, I spend most of my time figuring out the fastest way to transform data and perform calculations on big data frames.

Here is a great solution from the folks at R-studio. I had to post this to get the word out.

Introducing dplyr

'dplyr' is a package that extends the plyr package to data frames and increases the speed tremendously. I plan to post examples using this new package soon.

This entry was posted in analysis, data, R and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *