The MathematicaTip Twitter feed is full of gems, but one caught my eye concerning Alternatives. The post suggested using Alternatives
as a way to find elements of one list that are members of another, e.g.:
Cases[$data, Alternatives @@ $interesting]
I decided to benchmark this against a naive MemberQ
implementation. I was shocked by the results. Consider:
SeedRandom[0]
$data = RandomInteger[1000000, 1000000];
$interesting = DeleteDuplicates[RandomInteger[1000000, 1000]];
Length @$interesting // Print
Length@Cases[$data, Alternatives @@ $interesting] // Timing // Print
On my machine, running Mathematica 8.0.4, this runs in 0.218 seconds. For comparison, consider this:
Length@Cases[$data, n_ /; MemberQ[$interesting, n]] // Timing // Print
which runs in 58.064 seconds... 266 times slower. What surprises me is not the slowness of the MemberQ
version, but rather the blistering speed of the Alternatives
version. Clearly, Alternatives
is being implemented by a hash lookup or something. This
analysis is further supported by what happens if we increase the size of the "interesting" list from 1,000 to 10,000 elements. The Alternatives
version runs in almost the same time (0.234 s) while
the MemberQ
predictably runs 10x slower (565.847 s). Even at 1,000,000 interesting values, Alternatives
still runs fast (0.951s).
I suppose that all along my mental performance model of Alternatives
has unfairly equated it to MemberQ
. Also, I've only considered it for small numbers of alternatives. No more. Alternatives
scales up very well.