... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)

2014-10-17

SQL Character Collation

The following SQL Server query will generate a list of characters for the first 65,536 code points in collation order. For illustration purposes, a WHERE clause has been added to show only those code points that lie within the printable ASCII range:

WITH
  n AS (
    SELECT 0 AS n
    UNION ALL SELECT n + 1 FROM n  WHERE n < 65536
  )
, c AS (
    SELECT n AS codePoint, NCHAR(n) AS char FROM n
  )
, coll AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY char, codePoint) AS seq FROM c
  )
SELECT
  *
FROM coll
WHERE codePoint BETWEEN 32 AND 126
ORDER BY seq
OPTION (MAXRECURSION 0)

2014-08-14

Passing integer lists to SQL Server, Redux

Newer versions of SQL Server support array-valued query parameters, but not all database drivers support them (e.g. jTDS or, at time of writing, the Microsoft JDBC driver). It is a widely known pattern to pass lists of integers in a CSV string, and then to use a TSQL function to parse that string. This method is also known to be quite slow for large lists. Here is a TSQL function alternative that uses a binary representation of the integers, and runs about an order of magnitude faster than the CSV approach:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS TABLE AS RETURN
WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
SELECT TOP (LEN(@idList) / 4)
  i + 1 AS idx
, CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
FROM indices

GO

SELECT * FROM dbo.fnBinaryToIntTable(CAST(134 AS BINARY(4)) + CAST(463 AS BINARY(4)))
OPTION(MAXRECURSION 0)

Unfortunately, this function requires the calling query to specify OPTION(MAXRECURSION 0) for lists with more than 100 elements. This is a significant source of error as it not an obvious calling requirement. It is possible to bake the option into the query by using an intermediate table variable:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS @result TABLE(idx INT, n INT)
AS BEGIN
  WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
  INSERT INTO @result
  SELECT TOP (LEN(@idList) / 4)
    i + 1 AS idx
  , CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
  FROM indices
  OPTION(MAXRECURSION 0)
  RETURN
END

... but this performs only 2-3x better than the CSV solution. Trade-offs, trade-offs.

2014-08-07

Com4J leaks COM objects

com4j provides a helper class ComObjectCollector. It is used to track and dispose of all COM objects created within a sequence of operations. For example:

ComObjectCollector objects = new ComObjectCollector();
COM4J.addListener(objects);

_Application xl = excel.ClassFactory.createApplication();
Workbooks workbooks = xl.getWorkbooks();
_Workbook workbook = workbooks.add(null, 0);
Com4jObject sheetObj = workbook.getWorksheets(1);
_Worksheet sheet = sheetObj.queryInterface(_Worksheet.class);
// ...

objects.disposeAll();
// BUG: "sheet" should be disposed at this point, but is not
COM4J.removeListener(objects);

Unfortunately, version com4j-20120426-2 leaks a COM reference in the exhibited code. Specifically, the reference held by sheet is not disposed by objects.disposeAll.

The reason is subtle. ComObjectCollector internally uses a WeakHashMap to hold the object references. The references are represented by the com4j Wrapper class. Wrapper redefines the equals method to account for aliased references. It turns out that this method considers sheetObj and sheet to be equal even though they are actually distinct references that need to be disposed individually. sheet is therefore not actually added to the map because of that equality. Consequently, it is not disposed by objects.dispoaseAll().

Incidentally, Wrapper should not be used with WeakHashMap since the latter stipulates that it should only be used with objects that implement identity equality.

The fix to the leak is actually quite simple... re-implement the trivial class ComObjectCollector. Here is an inline version:

final Map _objects = new IdentityHashMap<>();
ComObjectListener listener = new ComObjectListener() {
    @Override
    public void onNewObject(Com4jObject obj) {
        _objects.put(obj, null);
    }
};
COM4J.addListener(listener);

_Application xl = excel.ClassFactory.createApplication();
Workbooks workbooks = xl.getWorkbooks();
_Workbook workbook = workbooks.add(null, 0);
Com4jObject sheetObj = workbook.getWorksheets(1);
_Worksheet sheet = sheetObj.queryInterface(_Worksheet.class);
// ...

for (Com4jObject obj : _objects.keySet()) {
    obj.dispose();
}
COM4J.removeListener(listener);

This version uses strong references instead of weak references, but that should not be a problem in the absence of large numbers of transient references. A more elaborate implementation would have to either change the equality semantics of Wrapper (requiring an impact analysis), or hold more direct references to the underlying COM objects in the WeakHashMap.

2014-08-05

Broken Whole Row References in Apache POI

There is a bug in the Apache POI AreaReference.getWholeRow(). It returns a reference like $A1:$IV1. This is true for old Excel workbooks, but for the new format that POI supports as XSSFWorkbook, the reference should be $A1:$XFD1.

If you attempt to work around this by defining an explicit name for $A1:$XFD1, Excel will quietly rewrite the reference to $1:$1, triggering the bug once again in POI. A nasty work-around is to define a name as $A1:$XFC1 (note: XFC instead of XFD). Excel leaves this reference untouched and POI handles it correctly. Too bad it is an awfully inconvenient work-around.

2014-04-24

Hashing + Doubles = Sneaky Nondeterminism in a JUnit Test

One of our calculation unit tests would fail erratically. Most of the time, it would pass. But on some runs (one in five?) one calculated double value would mysteriously flip from 0.0 to -0.0. Floating-point numerical errors are well-known, but the nondeterminism was a surprise as we could not think of any random or time-based elements in our calculation.

The cause turned out to be missing hashCode()/equals() on an object that was placed into a map. In such cases, the identity hash is used. The identity hash is typically based upon a memory location or a handle identifier and, as such, can easily change from run to run. The entries in the map were being iterated over, but the changing hashes would cause the iteration order to change. Elaborately calculated values from each iteration were being summed, and the changing summation order would sometimes produce a -0.0 from an underflow.

The nondeterminism did not produce any material errors in the calculated results -- strictly speaking there was not a bug in the code. However, the intermittent failures of JUnit tests was unhelpful.

2014-04-13

Heartbleed Commit

Here is a link to the commit that introduced the Heartbleed OpenSSL Vulnerability.

2014-04-10

Eclipse OSGI - The Windows Virus

When Eclipse OSGI starts up, it checks to see whether certain locations are writable. Based on what it finds, it chooses locations for the configuration area, the user base directory, etc. What is "interesting" is the way that it determines whether a location is writable. Here is an excerpt from AdaptorUtil.java:

File fileTest = null;
try {
    // we use the .dll suffix to properly test on Vista virtual directories
    // on Vista you are not allowed to write executable files on virtual directories like "Program Files"
    fileTest = File.createTempFile("writtableArea", ".dll", installDir); //$NON-NLS-1$ //$NON-NLS-2$
} catch (IOException e) {
    //If an exception occured while trying to create the file, it means that it is not writtable
    return false;
} finally {
    if (fileTest != null)
        fileTest.delete();
}
return true;

The code attempts to create a file named writtableArea.dll in the target directory. This sets off alarm bells on many virus scanners. This came to my attention because exactly one of users reported that our RCP application would simply disappear after presenting the splash screen. They were using Kapersky with particularly draconian settings.

2014-01-08

Eclipse RCP vs. Changing a Perspective Name

Using Eclipse 3.7.2 or 4.3.1, I ran into a problem where I wanted to change the name of the perspective in an RCP application. I changed the title in the bundle.properties file and relaunched my application from within the Eclipse IDE. The title had not changed. My immediate thought was that the title was cached in the application workspace data. So I relaunched the application after changing the launch configuration to clear that data first. The old label remained. I cleaned and rebuilt the application projects. I exited and re-entered Eclipse. I shook my fist in frustration. None of these would make the label change.

A scan of all files within the IDE .metadata directly yielded a couple of binary files that contained the old string. After some head-scratching, it was determined that these files were in the application's configuration area. This can be cleared in the launch configuration by switching to the Configuration tab and ticking the Clear the configuration area before launching checkbox. After that, the new label took effect.

Blog Archive