... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)

2012-04-17

Mathematica Alternatives is Fast

The MathematicaTip Twitter feed is full of gems, but one caught my eye concerning Alternatives. The post suggested using Alternatives as a way to find elements of one list that are members of another, e.g.:

Cases[$data, Alternatives @@ $interesting]

I decided to benchmark this against a naive MemberQ implementation. I was shocked by the results. Consider:

SeedRandom[0]
$data = RandomInteger[1000000, 1000000];
$interesting = DeleteDuplicates[RandomInteger[1000000, 1000]];
Length @$interesting // Print
Length@Cases[$data, Alternatives @@ $interesting] // Timing // Print

On my machine, running Mathematica 8.0.4, this runs in 0.218 seconds. For comparison, consider this:

Length@Cases[$data, n_ /; MemberQ[$interesting, n]] // Timing // Print

which runs in 58.064 seconds... 266 times slower. What surprises me is not the slowness of the MemberQ version, but rather the blistering speed of the Alternatives version. Clearly, Alternatives is being implemented by a hash lookup or something. This analysis is further supported by what happens if we increase the size of the "interesting" list from 1,000 to 10,000 elements. The Alternatives version runs in almost the same time (0.234 s) while the MemberQ predictably runs 10x slower (565.847 s). Even at 1,000,000 interesting values, Alternatives still runs fast (0.951s).

I suppose that all along my mental performance model of Alternatives has unfairly equated it to MemberQ. Also, I've only considered it for small numbers of alternatives. No more. Alternatives scales up very well.

Blog Archive