2012-04-17

Mathematica Alternatives is Fast

The MathematicaTip Twitter feed is full of gems, but one caught my eye concerning Alternatives. The post suggested using Alternatives as a way to find elements of one list that are members of another, e.g.:

Cases[$data, Alternatives @@ $interesting]

I decided to benchmark this against a naive MemberQ implementation. I was shocked by the results. Consider:

SeedRandom[0]
$data = RandomInteger[1000000, 1000000];
$interesting = DeleteDuplicates[RandomInteger[1000000, 1000]];
Length @$interesting // Print
Length@Cases[$data, Alternatives @@ $interesting] // Timing // Print

On my machine, running Mathematica 8.0.4, this runs in 0.218 seconds. For comparison, consider this:

Length@Cases[$data, n_ /; MemberQ[$interesting, n]] // Timing // Print

which runs in 58.064 seconds... 266 times slower. What surprises me is not the slowness of the MemberQ version, but rather the blistering speed of the Alternatives version. Clearly, Alternatives is being implemented by a hash lookup or something. This analysis is further supported by what happens if we increase the size of the "interesting" list from 1,000 to 10,000 elements. The Alternatives version runs in almost the same time (0.234 s) while the MemberQ predictably runs 10x slower (565.847 s). Even at 1,000,000 interesting values, Alternatives still runs fast (0.951s).

I suppose that all along my mental performance model of Alternatives has unfairly equated it to MemberQ. Also, I've only considered it for small numbers of alternatives. No more. Alternatives scales up very well.