... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)

2012-04-17

Mathematica Alternatives is Fast

The MathematicaTip Twitter feed is full of gems, but one caught my eye concerning Alternatives. The post suggested using Alternatives as a way to find elements of one list that are members of another, e.g.:

Cases[$data, Alternatives @@ $interesting]

I decided to benchmark this against a naive MemberQ implementation. I was shocked by the results. Consider:

SeedRandom[0]
$data = RandomInteger[1000000, 1000000];
$interesting = DeleteDuplicates[RandomInteger[1000000, 1000]];
Length @$interesting // Print
Length@Cases[$data, Alternatives @@ $interesting] // Timing // Print

On my machine, running Mathematica 8.0.4, this runs in 0.218 seconds. For comparison, consider this:

Length@Cases[$data, n_ /; MemberQ[$interesting, n]] // Timing // Print

which runs in 58.064 seconds... 266 times slower. What surprises me is not the slowness of the MemberQ version, but rather the blistering speed of the Alternatives version. Clearly, Alternatives is being implemented by a hash lookup or something. This analysis is further supported by what happens if we increase the size of the "interesting" list from 1,000 to 10,000 elements. The Alternatives version runs in almost the same time (0.234 s) while the MemberQ predictably runs 10x slower (565.847 s). Even at 1,000,000 interesting values, Alternatives still runs fast (0.951s).

I suppose that all along my mental performance model of Alternatives has unfairly equated it to MemberQ. Also, I've only considered it for small numbers of alternatives. No more. Alternatives scales up very well.

Absolute Project Pathnames in the Eclipse Workspace Hierarchy

In a "normal" Eclipse (3.x) workspace, the project directories are stored in the root of the workspace directory. That is, they are siblings of the .metadata directory. In these circumstances, the Eclipse workspace will keep an (implicit) relative path reference to each project. Consequently, if you move the workspace to a different location in the file system, Eclipse can still find the projects using that relative path.

On the other hand, Eclipse can also reference projects that lie outside of the workspace directory. In such cases the project path references are explicit and absolute. Therefore, they remain invariant if the workspace directory is moved.

So far, so good. However, there is a troublesome intermediate case. If the project directories are stored in subfolders within the workspace directory, then Eclipse does not refer to them using relative pathnames. Unexpectedly, the paths are absolute. As a result, a relocated or copied workspace will still refer to the contained projects in their original locations.

Incidentally, Eclipse 3.6 stores each project location in a file located at .metadata/project name/.location. Creating a script to alter project location paths is unnecessarily complicated by the fact that the .location files are binary.

Blog Archive