The MathematicaTip Twitter feed is full of gems, but one caught my eye concerning Alternatives. The post suggested using
Alternatives as a way to find elements of one list that are members of another, e.g.:
Cases[$data, Alternatives @@ $interesting]
I decided to benchmark this against a naive
MemberQ implementation. I was shocked by the results. Consider:
SeedRandom $data = RandomInteger[1000000, 1000000]; $interesting = DeleteDuplicates[RandomInteger[1000000, 1000]]; Length @$interesting // Print Length@Cases[$data, Alternatives @@ $interesting] // Timing // Print
On my machine, running Mathematica 8.0.4, this runs in 0.218 seconds. For comparison, consider this:
Length@Cases[$data, n_ /; MemberQ[$interesting, n]] // Timing // Print
which runs in 58.064 seconds... 266 times slower. What surprises me is not the slowness of the
MemberQ version, but rather the blistering speed of the
Alternatives version. Clearly,
Alternatives is being implemented by a hash lookup or something. This
analysis is further supported by what happens if we increase the size of the "interesting" list from 1,000 to 10,000 elements. The
Alternatives version runs in almost the same time (0.234 s) while
MemberQ predictably runs 10x slower (565.847 s). Even at 1,000,000 interesting values,
Alternatives still runs fast (0.951s).
I suppose that all along my mental performance model of
Alternatives has unfairly equated it to
MemberQ. Also, I've only considered it for small numbers of alternatives. No more.
Alternatives scales up very well.