Weakly Reachable

... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)

2014-08-14

Passing integer lists to SQL Server, Redux

Newer versions of SQL Server support array-valued query parameters, but not all database drivers support them (e.g. jTDS or, at time of writing, the Microsoft JDBC driver). It is a widely known pattern to pass lists of integers in a CSV string, and then to use a TSQL function to parse that string. This method is also known to be quite slow for large lists. Here is a TSQL function alternative that uses a binary representation of the integers, and runs about an order of magnitude faster than the CSV approach:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS TABLE AS RETURN
WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
SELECT TOP (LEN(@idList) / 4)
  i + 1 AS idx
, CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
FROM indices

GO

SELECT * FROM dbo.fnBinaryToIntTable(CAST(134 AS BINARY(4)) + CAST(463 AS BINARY(4)))
OPTION(MAXRECURSION 0)

Unfortunately, this function requires the calling query to specify OPTION(MAXRECURSION 0) for lists with more than 100 elements. This is a significant source of error as it not an obvious calling requirement. It is possible to bake the option into the query by using an intermediate table variable:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS @result TABLE(idx INT, n INT)
AS BEGIN
  WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
  INSERT INTO @result
  SELECT TOP (LEN(@idList) / 4)
    i + 1 AS idx
  , CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
  FROM indices
  OPTION(MAXRECURSION 0)
  RETURN
END

... but this performs only 2-3x better than the CSV solution. Trade-offs, trade-offs.

2014-08-07

Com4J leaks COM objects

com4j provides a helper class ComObjectCollector. It is used to track and dispose of all COM objects created within a sequence of operations. For example:

ComObjectCollector objects = new ComObjectCollector();
COM4J.addListener(objects);

_Application xl = excel.ClassFactory.createApplication();
Workbooks workbooks = xl.getWorkbooks();
_Workbook workbook = workbooks.add(null, 0);
Com4jObject sheetObj = workbook.getWorksheets(1);
_Worksheet sheet = sheetObj.queryInterface(_Worksheet.class);
// ...

objects.disposeAll();
// BUG: "sheet" should be disposed at this point, but is not
COM4J.removeListener(objects);

Unfortunately, version com4j-20120426-2 leaks a COM reference in the exhibited code. Specifically, the reference held by sheet is not disposed by objects.disposeAll.

The reason is subtle. ComObjectCollector internally uses a WeakHashMap to hold the object references. The references are represented by the com4j Wrapper class. Wrapper redefines the equals method to account for aliased references. It turns out that this method considers sheetObj and sheet to be equal even though they are actually distinct references that need to be disposed individually. sheet is therefore not actually added to the map because of that equality. Consequently, it is not disposed by objects.dispoaseAll().

Incidentally, Wrapper should not be used with WeakHashMap since the latter stipulates that it should only be used with objects that implement identity equality.

The fix to the leak is actually quite simple... re-implement the trivial class ComObjectCollector. Here is an inline version:

final Map _objects = new IdentityHashMap<>();
ComObjectListener listener = new ComObjectListener() {
    @Override
    public void onNewObject(Com4jObject obj) {
        _objects.put(obj, null);
    }
};
COM4J.addListener(listener);

_Application xl = excel.ClassFactory.createApplication();
Workbooks workbooks = xl.getWorkbooks();
_Workbook workbook = workbooks.add(null, 0);
Com4jObject sheetObj = workbook.getWorksheets(1);
_Worksheet sheet = sheetObj.queryInterface(_Worksheet.class);
// ...

for (Com4jObject obj : _objects.keySet()) {
    obj.dispose();
}
COM4J.removeListener(listener);

This version uses strong references instead of weak references, but that should not be a problem in the absence of large numbers of transient references. A more elaborate implementation would have to either change the equality semantics of Wrapper (requiring an impact analysis), or hold more direct references to the underlying COM objects in the WeakHashMap.

2014-08-05

Broken Whole Row References in Apache POI

There is a bug in the Apache POI AreaReference.getWholeRow(). It returns a reference like $A1:$IV1. This is true for old Excel workbooks, but for the new format that POI supports as XSSFWorkbook, the reference should be $A1:$XFD1.

If you attempt to work around this by defining an explicit name for $A1:$XFD1, Excel will quietly rewrite the reference to $1:$1, triggering the bug once again in POI. A nasty work-around is to define a name as $A1:$XFC1 (note: XFC instead of XFD). Excel leaves this reference untouched and POI handles it correctly. Too bad it is an awfully inconvenient work-around.

2014-04-24

Hashing + Doubles = Sneaky Nondeterminism in a JUnit Test

One of our calculation unit tests would fail erratically. Most of the time, it would pass. But on some runs (one in five?) one calculated double value would mysteriously flip from 0.0 to -0.0. Floating-point numerical errors are well-known, but the nondeterminism was a surprise as we could not think of any random or time-based elements in our calculation.

The cause turned out to be missing hashCode()/equals() on an object that was placed into a map. In such cases, the identity hash is used. The identity hash is typically based upon a memory location or a handle identifier and, as such, can easily change from run to run. The entries in the map were being iterated over, but the changing hashes would cause the iteration order to change. Elaborately calculated values from each iteration were being summed, and the changing summation order would sometimes produce a -0.0 from an underflow.

The nondeterminism did not produce any material errors in the calculated results -- strictly speaking there was not a bug in the code. However, the intermittent failures of JUnit tests was unhelpful.

Blog Archive