Weakly Reachable

... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)


LOOKUP vs. Aspose Cells for Java

Consider the following Excel formula:

=LOOKUP(2, {"a",1,2,3}, {"x","one","two","three"})

In Excel 2013, this returns "two". In Aspose Cells for Java 8.3.2, it returns "N/A".

The Excel documentation is wishy-washy on what the correct result should be:

Important: The values in array must be placed in ascending order: ..., -2, -1, 0, 1, 2, ..., A-Z, FALSE, TRUE; otherwise, LOOKUP might not return the correct value. Uppercase and lowercase text are equivalent.

This tells us that it is inadvisable for our look-up table to have "a" before the numeric values 1, 2, 3. The Cells result is probably superior to Excel's, but unfortunately it can mean that a user's spreadsheet can produce different results.

This is obviously a corner case, and a spreadsheet that depends upon a well-defined result in such circumstances is asking for trouble. I reported this to Aspose not so much for a fix, but rather to potentially save another user from a long debugging session.


OSGI Configuration Area Considered Harmful

Our Eclipse RCP application has a test suite that is itself packaged up as an RCP application. One of the unit tests started failing, complaining that the expected hash code for the contents of a certain file is not as expected. The file had indeed been changed, and the test had been updated accordingly. Yet the test failure reported that the hash was the old one instead of the new one. We double-, triple-, quadruple-checked and verified that the correct file was on disk. And the hash code in the unit test matched the new file.

To add suspense, the test would only fail on one machine, the build server of course. The test passed on every other machine we tried, whether we ran it in the IDE or in a full headless build.

After much head-scratching to no avail, we finally went low tech and modified the code running on the build server to print out the path of the file in question. We were surprised, but not surprised, to find that the file being used by the test was not anywhere in the source tree, nor in the temporary build area. Rather, it was in the OSGI Configuration area.

Apparently, OSGI copies certain resources from their host plug-ins into the configuration area. Our code had asked for a file path (not URL) to the resource, so that behaviour is fair enough. What is confusing is that OSGI never refreshed that file even when the source resource had changed. Some of the resources had been refreshed, but not that one. I suppose that is what "eclipse -clean" is all about?

In hindsight, this is not the first such weird behaviour that I have seen with respect to the configuration area. I can recall instances when new perspective layouts, or even simple UI part labels, would not reflect their current values until the configuration area was cleared.

In case anyone was wondering, it is failures like this that drove us to package the test suite as an RCP application in the first place. Running the tests in the OSGI environment can bring forth obscure (and irritating) behaviours such as this.

Moral: if an OSGI app is behaving in a manner that defies explanation, see if the behaviour persists after clearing the configuration area.


SQL Character Collation

The following SQL Server query will generate a list of characters for the first 65,536 code points in collation order. For illustration purposes, a WHERE clause has been added to show only those code points that lie within the printable ASCII range:

  n AS (
    SELECT 0 AS n
    UNION ALL SELECT n + 1 FROM n  WHERE n < 65536
, c AS (
    SELECT n AS codePoint, NCHAR(n) AS char FROM n
, coll AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY char, codePoint) AS seq FROM c
FROM coll
WHERE codePoint BETWEEN 32 AND 126


Passing integer lists to SQL Server, Redux

Newer versions of SQL Server support array-valued query parameters, but not all database drivers support them (e.g. jTDS or, at time of writing, the Microsoft JDBC driver). It is a widely known pattern to pass lists of integers in a CSV string, and then to use a TSQL function to parse that string. This method is also known to be quite slow for large lists. Here is a TSQL function alternative that uses a binary representation of the integers, and runs about an order of magnitude faster than the CSV approach:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
SELECT TOP (LEN(@idList) / 4)
  i + 1 AS idx
, CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
FROM indices


SELECT * FROM dbo.fnBinaryToIntTable(CAST(134 AS BINARY(4)) + CAST(463 AS BINARY(4)))

Unfortunately, this function requires the calling query to specify OPTION(MAXRECURSION 0) for lists with more than 100 elements. This is a significant source of error as it not an obvious calling requirement. It is possible to bake the option into the query by using an intermediate table variable:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS @result TABLE(idx INT, n INT)
  WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
  INSERT INTO @result
  SELECT TOP (LEN(@idList) / 4)
    i + 1 AS idx
  , CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
  FROM indices

... but this performs only 2-3x better than the CSV solution. Trade-offs, trade-offs.

Blog Archive