Weakly Reachable

Obsolete Cookie Support in Java

2017-05-24T11:43:00.001-06:00

The current HTTP cookie specification is RFC 6265, published back in 2011. It obsoletes RFC 2965 (from 2000) which in turn obsoletes RFC 2109 (1997). RFC 6265 retired support for the pseudo-cookies whose names start with a dollar sign. Such cookies used to be interpreted as cookie attributes (e.g. Path, Version etc). Web servers will now often reject requests containing cookie names that start with "$".

Unfortunately, the Java HTTP client does not yet support RFC 6265 (as of version 8). It still implements RFC 2965 (and supports RFC 2109). In particular, it generates $-prefixed pseudo-cookies and is thus incompatible with many contemporary servers. The incompatibility may manifest as 400 Bad Request errors, and even then only when using secure protocols where HTTP message syntax checking is more stringent.

Use the Apache client instead.

Windows RDP vs. Password Changes

2016-08-15T11:37:00.000-06:00

On Windows, there is an obscure security policy that prohibits changing one's password for the first time through Remote Desktop. Normally, one can type CTRL-ALT-DEL to bring up a menu that, among other things, allows a password change. When logged in remotely using RDP, the key chord is CTRL-ALT-END. However, Windows can detect this situation, and if the very first password change is attempted after using CTRL-ALT-END then it will be rejected as "access denied". There is a work-around, though:

Bring up the onscreen keyboard (OSK.EXE).
Type CTRL-ALT using the regular keyboard.
Press DEL on the onscreen keyboard.
The password change function will now operate normally (i.e. no "access denied").

This is helpful on virtual machines where remote access is the only possibility. The purpose of this strange security policy remains unclear.

Windows Event Logs: Command-line Access

2016-08-15T11:29:00.000-06:00

The following Windows command will extract events from the Security log on the machine somehost provided those events have the ID 4624 (login) and are for the user someuser:

wevtutil ^
  query-events ^
  /r:somehost ^
  Security ^
  /q:"Event[System[EventID=4624] and EventData[Data[@Name='TargetUserName']='someuser']]"

The query syntax is described as being XPath that operates upon the XML representation of the events. Beware that it is a actually weird subset of XPath that does not recognize ., .., /, or //. Also, the generated XML results are free-standing fragments for each event (i.e. not wrapped by a parent element and thus not well-formed XML).

Dependency Injection Frameworks

2016-06-23T09:22:00.002-06:00

System requirements that justify the use of a dependency injection framework:

The system has at least 64 components whose implementations must be selectable at run-time by editing an external text file.
The development team is not permitted to know the application initialization sequence.

LOOKUP vs. Aspose Cells for Java

2015-08-17T12:54:00.001-06:00

Consider the following Excel formula:

=LOOKUP(2, {"a",1,2,3}, {"x","one","two","three"})

In Excel 2013, this returns "two". In Aspose Cells for Java 8.3.2, it returns "N/A".

The Excel documentation is wishy-washy on what the correct result should be:

Important: The values in array must be placed in ascending order: ..., -2, -1, 0, 1, 2, ..., A-Z, FALSE, TRUE; otherwise, LOOKUP might not return the correct value. Uppercase and lowercase text are equivalent.

This tells us that it is inadvisable for our look-up table to have "a" before the numeric values 1, 2, 3. The Cells result is probably superior to Excel's, but unfortunately it can mean that a user's spreadsheet can produce different results.

This is obviously a corner case, and a spreadsheet that depends upon a well-defined result in such circumstances is asking for trouble. I reported this to Aspose not so much for a fix, but rather to potentially save another user from a long debugging session.

OSGI Configuration Area Considered Harmful

2015-01-29T15:13:00.000-07:00

Our Eclipse RCP application has a test suite that is itself packaged up as an RCP application. One of the unit tests started failing, complaining that the expected hash code for the contents of a certain file is not as expected. The file had indeed been changed, and the test had been updated accordingly. Yet the test failure reported that the hash was the old one instead of the new one. We double-, triple-, quadruple-checked and verified that the correct file was on disk. And the hash code in the unit test matched the new file.

To add suspense, the test would only fail on one machine, the build server of course. The test passed on every other machine we tried, whether we ran it in the IDE or in a full headless build.

After much head-scratching to no avail, we finally went low tech and modified the code running on the build server to print out the path of the file in question. We were surprised, but not surprised, to find that the file being used by the test was not anywhere in the source tree, nor in the temporary build area. Rather, it was in the OSGI Configuration area.

Apparently, OSGI copies certain resources from their host plug-ins into the configuration area. Our code had asked for a file path (not URL) to the resource, so that behaviour is fair enough. What is confusing is that OSGI never refreshed that file even when the source resource had changed. Some of the resources had been refreshed, but not that one. I suppose that is what "eclipse -clean" is all about?

In hindsight, this is not the first such weird behaviour that I have seen with respect to the configuration area. I can recall instances when new perspective layouts, or even simple UI part labels, would not reflect their current values until the configuration area was cleared.

In case anyone was wondering, it is failures like this that drove us to package the test suite as an RCP application in the first place. Running the tests in the OSGI environment can bring forth obscure (and irritating) behaviours such as this.

Moral: if an OSGI app is behaving in a manner that defies explanation, see if the behaviour persists after clearing the configuration area.

SQL Character Collation

2014-10-17T12:45:00.000-06:00

The following SQL Server query will generate a list of characters for the first 65,536 code points in collation order. For illustration purposes, a WHERE clause has been added to show only those code points that lie within the printable ASCII range:

WITH
  n AS (
    SELECT 0 AS n
    UNION ALL SELECT n + 1 FROM n  WHERE n < 65536
  )
, c AS (
    SELECT n AS codePoint, NCHAR(n) AS char FROM n
  )
, coll AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY char, codePoint) AS seq FROM c
  )
SELECT
  *
FROM coll
WHERE codePoint BETWEEN 32 AND 126
ORDER BY seq
OPTION (MAXRECURSION 0)

Passing integer lists to SQL Server, Redux

2014-08-14T12:12:00.001-06:00

Newer versions of SQL Server support array-valued query parameters, but not all database drivers support them (e.g. jTDS or, at time of writing, the Microsoft JDBC driver). It is a widely known pattern to pass lists of integers in a CSV string, and then to use a TSQL function to parse that string. This method is also known to be quite slow for large lists. Here is a TSQL function alternative that uses a binary representation of the integers, and runs about an order of magnitude faster than the CSV approach:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS TABLE AS RETURN
WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
SELECT TOP (LEN(@idList) / 4)
  i + 1 AS idx
, CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
FROM indices

GO

SELECT * FROM dbo.fnBinaryToIntTable(CAST(134 AS BINARY(4)) + CAST(463 AS BINARY(4)))
OPTION(MAXRECURSION 0)

Unfortunately, this function requires the calling query to specify OPTION(MAXRECURSION 0) for lists with more than 100 elements. This is a significant source of error as it not an obvious calling requirement. It is possible to bake the option into the query by using an intermediate table variable:

CREATE FUNCTION dbo.fnBinaryToIntTable(@idList VARBINARY(MAX))
RETURNS @result TABLE(idx INT, n INT)
AS BEGIN
  WITH indices AS (SELECT 0 AS i UNION ALL SELECT i + 1 FROM indices)
  INSERT INTO @result
  SELECT TOP (LEN(@idList) / 4)
    i + 1 AS idx
  , CAST(SUBSTRING(@idList, i*4 + 1, 4) AS int) AS n
  FROM indices
  OPTION(MAXRECURSION 0)
  RETURN
END

... but this performs only 2-3x better than the CSV solution. Trade-offs, trade-offs.

Com4J leaks COM objects

2014-08-07T08:48:00.001-06:00

com4j provides a helper class ComObjectCollector. It is used to track and dispose of all COM objects created within a sequence of operations. For example:

ComObjectCollector objects = new ComObjectCollector();
COM4J.addListener(objects);

_Application xl = excel.ClassFactory.createApplication();
Workbooks workbooks = xl.getWorkbooks();
_Workbook workbook = workbooks.add(null, 0);
Com4jObject sheetObj = workbook.getWorksheets(1);
_Worksheet sheet = sheetObj.queryInterface(_Worksheet.class);
// ...

objects.disposeAll();
// BUG: "sheet" should be disposed at this point, but is not
COM4J.removeListener(objects);

Unfortunately, version com4j-20120426-2 leaks a COM reference in the exhibited code. Specifically, the reference held by sheet is not disposed by objects.disposeAll.

The reason is subtle. ComObjectCollector internally uses a WeakHashMap to hold the object references. The references are represented by the com4j Wrapper class. Wrapper redefines the equals method to account for aliased references. It turns out that this method considers sheetObj and sheet to be equal even though they are actually distinct references that need to be disposed individually. sheet is therefore not actually added to the map because of that equality. Consequently, it is not disposed by objects.dispoaseAll().

Incidentally, Wrapper should not be used with WeakHashMap since the latter stipulates that it should only be used with objects that implement identity equality.

The fix to the leak is actually quite simple... re-implement the trivial class ComObjectCollector. Here is an inline version:

final Map _objects = new IdentityHashMap<>();
ComObjectListener listener = new ComObjectListener() {
    @Override
    public void onNewObject(Com4jObject obj) {
        _objects.put(obj, null);
    }
};
COM4J.addListener(listener);

_Application xl = excel.ClassFactory.createApplication();
Workbooks workbooks = xl.getWorkbooks();
_Workbook workbook = workbooks.add(null, 0);
Com4jObject sheetObj = workbook.getWorksheets(1);
_Worksheet sheet = sheetObj.queryInterface(_Worksheet.class);
// ...

for (Com4jObject obj : _objects.keySet()) {
    obj.dispose();
}
COM4J.removeListener(listener);

This version uses strong references instead of weak references, but that should not be a problem in the absence of large numbers of transient references. A more elaborate implementation would have to either change the equality semantics of Wrapper (requiring an impact analysis), or hold more direct references to the underlying COM objects in the WeakHashMap.

Broken Whole Row References in Apache POI

2014-08-05T15:56:00.000-06:00

There is a bug in the Apache POI AreaReference.getWholeRow(). It returns a reference like $A1:$IV1. This is true for old Excel workbooks, but for the new format that POI supports as XSSFWorkbook, the reference should be $A1:$XFD1.

If you attempt to work around this by defining an explicit name for $A1:$XFD1, Excel will quietly rewrite the reference to $1:$1, triggering the bug once again in POI. A nasty work-around is to define a name as $A1:$XFC1 (note: XFC instead of XFD). Excel leaves this reference untouched and POI handles it correctly. Too bad it is an awfully inconvenient work-around.

Hashing + Doubles = Sneaky Nondeterminism in a JUnit Test

2014-04-24T16:18:00.000-06:00

One of our calculation unit tests would fail erratically. Most of the time, it would pass. But on some runs (one in five?) one calculated double value would mysteriously flip from 0.0 to -0.0. Floating-point numerical errors are well-known, but the nondeterminism was a surprise as we could not think of any random or time-based elements in our calculation.

The cause turned out to be missing hashCode()/equals() on an object that was placed into a map. In such cases, the identity hash is used. The identity hash is typically based upon a memory location or a handle identifier and, as such, can easily change from run to run. The entries in the map were being iterated over, but the changing hashes would cause the iteration order to change. Elaborately calculated values from each iteration were being summed, and the changing summation order would sometimes produce a -0.0 from an underflow.

The nondeterminism did not produce any material errors in the calculated results -- strictly speaking there was not a bug in the code. However, the intermittent failures of JUnit tests was unhelpful.

Heartbleed Commit

2014-04-13T13:12:00.003-06:00

Here is a link to the commit that introduced the Heartbleed OpenSSL Vulnerability.

Eclipse OSGI - The Windows Virus

2014-04-10T17:06:00.001-06:00

When Eclipse OSGI starts up, it checks to see whether certain locations are writable. Based on what it finds, it chooses locations for the configuration area, the user base directory, etc. What is "interesting" is the way that it determines whether a location is writable. Here is an excerpt from AdaptorUtil.java:

File fileTest = null;
try {
    // we use the .dll suffix to properly test on Vista virtual directories
    // on Vista you are not allowed to write executable files on virtual directories like "Program Files"
    fileTest = File.createTempFile("writtableArea", ".dll", installDir); //$NON-NLS-1$ //$NON-NLS-2$
} catch (IOException e) {
    //If an exception occured while trying to create the file, it means that it is not writtable
    return false;
} finally {
    if (fileTest != null)
        fileTest.delete();
}
return true;

The code attempts to create a file named writtableArea.dll in the target directory. This sets off alarm bells on many virus scanners. This came to my attention because exactly one of users reported that our RCP application would simply disappear after presenting the splash screen. They were using Kapersky with particularly draconian settings.

Eclipse RCP vs. Changing a Perspective Name

2014-01-08T15:23:00.003-07:00

Using Eclipse 3.7.2 or 4.3.1, I ran into a problem where I wanted to change the name of the perspective in an RCP application. I changed the title in the bundle.properties file and relaunched my application from within the Eclipse IDE. The title had not changed. My immediate thought was that the title was cached in the application workspace data. So I relaunched the application after changing the launch configuration to clear that data first. The old label remained. I cleaned and rebuilt the application projects. I exited and re-entered Eclipse. I shook my fist in frustration. None of these would make the label change.

A scan of all files within the IDE .metadata directly yielded a couple of binary files that contained the old string. After some head-scratching, it was determined that these files were in the application's configuration area. This can be cleared in the launch configuration by switching to the Configuration tab and ticking the Clear the configuration area before launching checkbox. After that, the new label took effect.

Inspecting the SQL Server Plan Cache

2013-10-08T09:50:00.001-06:00

While investigating an insidious case of SQL Server query plan cache poisoning, I looked for a way to see the cached plans. Here is what I used:

WITH data AS (
  SELECT
    c.usecounts AS uses
  , s.max_worker_time AS max_time
  , CAST(
      p.query_plan.query('
        declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
        for $d in distinct-values(//ColumnReference/@Database)
        order by $d
        return data($d)
      ') AS NVARCHAR(MAX)) AS databases
  , CAST(
      p.query_plan.query('
        declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
        for $t in distinct-values(//ColumnReference/@Table)
        order by $t
        return data($t)
      ') AS NVARCHAR(MAX)) AS tables
  , c.objtype
  , t.text AS sql
  , p.query_plan
  , s.last_execution_time AS executed
  , s.execution_count AS runs
  , s.creation_time AS created
  , c.plan_handle
  , s.sql_handle
  , s.query_hash
  , s.query_plan_hash
  FROM sys.dm_exec_cached_plans  AS c
  LEFT JOIN sys.dm_exec_query_stats AS s
    ON s.plan_handle = c.plan_handle
  CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) AS t
  CROSS APPLY sys.dm_exec_query_plan(c.plan_handle) AS p
)
SELECT TOP 100 *
FROM data
ORDER BY max_time DESC

Note the final two hash columns. Their presence is to help identify when different plans have been cached for a single SQL statement that has been executed with different parameter values.

Cached plans can be cleared using DBCC FREEPROCCACHE or DBCC FREESYSTEMCACHE. When benchmarking SQL, it can be useful to discard SQL Server's internal buffers. This can be done without restarting the server thus:

CHECKPOINT
GO
DBCC DROPCLEANBUFFERS

On a related note, the query hint OPTIMIZE FOR UNKNOWN can be used to essentially eliminate parameter-sniffing -- reducing non-deterministic optimizer behaviour.

Language Constructs

2013-08-09T13:27:00.000-06:00

C programmers, whose language maps directly to machine instructions, scoff at the bloat, tediousness and inefficiencies of ultra-high-level languages like C++.

Java programmers, whose language constructs map directly to the output of paste -d '\n' <(man cobol) <(man simula), rejoice that they do not have to deal with the bit-twiddling and low-level trivia of machine languages like C++.

LISP programmers, whose language ~~maps directly to~~ is the metaphysical substrate of the cosmos, are confused by these opinions since they see no distinction between high-level and low-level code.

Perl programmers dismiss the whole debate as academic since their language maps directly to the structure of a human mind (named Larry).

Using Internal Java Implementation Classes

2013-06-04T11:49:00.000-06:00

The build of our Java 7 application failed today. Someone had referenced an internal Java implementation class within our code, namely com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter. We use Eclipse, and this reference compiled and ran fine within the IDE. However, the PDE build would fail saying cannot find symbol. The difference can be explained when one realizes that the IDE compiles using the Eclipse compiler, but the PDE build uses javac.

However, one might still be stumped as IndentingXMLStreamWriter can be found within rt.jar. It turns out that javac does not actually compile directly against rt.jar. Instead, it references the file lib/ct.sym. This file is essentially a zip archive containing all of the "blessed" classes. IndentingXMLStreamWriter is not in that file.

It is best to avoid using internal implementation classes, but there is a work-around if one insists. The undocumented compiler argument -XDignore.symbol.file will use rt.jar directly.

The details of this mechanism can be found by inspecting the OpenJDK source files com.sun.tools.javac.main.RecognizedOptions and com.sun.tools.javac.file.JavacFileManager.

Port Redirection Under Windows 7

2013-05-10T12:10:00.001-06:00

Today I wanted to perform some benchmarking of an SQL application. All of the work up to this point was on a single machine that hosted both the database and the application. As a result, SQL network traffic was flowing through the Loopback interface. I wanted to get a better idea of the effects of real network latency by putting the bytes out on the wire. I was too lazy to install the software or the database on another machine, so I wanted to set up a port reflector on a second Windows 7 machine.

First, I tried using netcat. Well, strictly speaking it was the nmap version, ncat:

ncat -l -p 9999 --keep-open -c "ncat win7host 1433"

This listens on the local port 9999 and feeds every incoming request to a new ncat process that sends the incoming packets to the target host on the SQL Server port 1433. This worked, and introduced network latency. A lot of latency. The processes being created for each connection introduced so much delay that throughput was similar to Internet latency. That was a little too much latency for my purposes.

As an alternative, I used the Windows netsh command:

netsh interface portproxy add v4tov4 listenport=9999 listenaddress=192.168.0.158 connectport=1433 connectaddress=win7host

This performed without any significant overhead introduced by the proxy itself. The port redirection was removed using:

netsh interface portproxy delete v4tov4 listenport=9999 listenaddress=192.168.0.158

Java SAXParserFactory vs. XMLFilter

2013-01-30T17:32:00.000-07:00

Using Java 7, I had problems trying to inject an XMLFilter into a SAX pipeline. My code looked something like this:

SAXParserFactory saxFactory = SAXParserFactory.newInstance();
saxFactory.setNamespaceAware(true);
saxFactory.setValidating(false);
saxFactory.setSchema(schema);

XMLReader reader = saxFactory.newSAXParser().getXMLReader();

XMLFilter filter = new NamespaceRewriter();

filter.setParent(reader);
filter.setContentHandler(handler);

filter.parse(source);

The filter was a trivial extension of XMLFilterImpl that rewrite some of the document namespaces. However, this code would fail. As can be seen, the XMLReader was configured to perform XML schema validation. The failure was that the reader would bypass the installed SAX filter when performing this validation. Using a debugger, I was able to verify at run-time the reader's ContentHandler was correctly set to be my filter. But the reader was apparently performing the schema validation prior prior to invoking its ContentHandler.

I was able to work around this problem by inserting an explicit validation stage in the pipeline instead of relying upon the validation baked into the XMLReader built by SAXParserFactory:

XMLReader reader = XMLReaderFactory.createXMLReader();

ValidatorHandler validator = schema.newValidatorHandler();

XMLFilter filter = new NamespaceRewriter();

filter.setParent(reader);
filter.setContentHandler(validator);
validator.setContentHandler(handler);

filter.parse(source);

The changes are emphasized.

I would say that this is unfortunate behaviour in the Java 7 implementation, but I lay the blame squarely on the SAX API. The SAX API makes it devilishly difficult to build reader/filter chains correctly. In particular, it is almost impossible to link independently assembled subchains into a single larger chain. The API supports easy delegation of parse calls, but chaining content handlers is messy. A filter is required to destructively replace the content handler of the next reader/filter in the chain with an augmented content handler. If that next component has built-in behaviour that is not expressed in the content handler, then that component cannot be wrapped successfully (e.g. the problem at hand). Even if the XMLReader above had expressed the schema validation as an augmentation to the content handler, how would a filter recover it? Presumably getContentHandler() is supposed to return the unaugmented handler -- so how is one supposed to get the augmented one?

SQL Server vs. EXISTS

2013-01-10T15:39:00.002-07:00

Using Microsoft SQL Server, I had a query that retrieved rows from a view of some complexity, i.e. something like this:

SELECT * FROM vwComplex WHERE ...

This query ran basically instantaneously, even though the view involved numerous UNION ALLs and joins on coalesced keys. My sample query criteria selected ~100 out of the view's ~80,000 rows.

In the general case, I needed a query that returned a single bit indicating whether the specific criteria will select at least one row. I tried this:

SELECT
  CASE WHEN EXISTS (
    SELECT * FROM vwComplex WHERE ...
  ) THEN 1 ELSE 0 END

I presumed that this statement would run faster than the original unadorned query since it can stop executing as soon as it finds the first row. My presumption was incorrect -- the query ran for a long, long time.

In this case, at least, it turns out that it is faster to run the full query than to perform an existence check:

SELECT SIGN(COUNT(*)) FROM vwComplex WHERE ...

No surprises here, of course. SQL query plan generation remains a mystical exercise.

SQL Server Schema Differences

2012-12-18T13:55:00.000-07:00

The following SQL statement will generate a crude list of column differences between two SQL Server databases:

DECLARE @oldDatabase NVARCHAR(MAX)
DECLARE @newDatabase NVARCHAR(MAX)

SET @oldDatabase = 'unittestnew_3_14_x'
SET @newDatabase = 'unittestnew_3_15_x'

DECLARE @sql NVARCHAR(MAX)
SET @sql = '
WITH
  oldColumns AS (
    SELECT
      tab.name AS tableName
    , ROW_NUMBER() OVER (PARTITION BY tab.name ORDER BY col.column_id) AS columnSeq
    , col.name AS columnName
    , typ.name AS type
    , col.max_length AS length
    , col.scale AS scale
    FROM ' + @oldDatabase + '.sys.all_columns AS col
    INNER JOIN ' + @oldDatabase + '.sys.all_objects AS tab
      ON tab.object_id = col.object_id
      AND tab.type = ''U''
    INNER JOIN ' + @oldDatabase + '.sys.schemas AS sch
      ON sch.schema_id = tab.schema_id
      AND sch.name = ''dbo''
    INNER JOIN ' + @oldDatabase + '.sys.systypes AS typ
      ON typ.xusertype = col.system_type_id
  )
, newColumns AS (
    SELECT
      tab.name AS tableName
    , ROW_NUMBER() OVER (PARTITION BY tab.name ORDER BY col.column_id) AS columnSeq
    , col.name AS columnName
    , typ.name AS type
    , col.max_length AS length
    , col.scale AS scale
    FROM ' + @newDatabase + '.sys.all_columns AS col
    INNER JOIN ' + @newDatabase + '.sys.all_objects AS tab
      ON tab.object_id = col.object_id
      AND tab.type = ''U''
    INNER JOIN ' + @newDatabase + '.sys.schemas AS sch
      ON sch.schema_id = tab.schema_id
      AND sch.name = ''dbo''
    INNER JOIN ' + @newDatabase + '.sys.systypes AS typ
      ON typ.xusertype = col.system_type_id
  )
, matched AS (
    SELECT
      COALESCE(old.tableName, new.tableName) AS tableName
    , old.columnSeq AS oldColumnSeq
    , old.columnName AS oldColumnName
    , old.type AS oldType
    , old.length AS oldLength
    , old.scale AS oldScale
    , new.columnSeq AS newColumnSeq
    , new.columnName AS newColumnName
    , new.type AS newType
    , new.length AS newLength
    , new.scale AS newScale
    FROM oldColumns AS old
    FULL OUTER JOIN newColumns AS new
      ON new.tableName = old.tableName
      AND new.columnName = old.columnName
  )
SELECT
  tableName
, oldColumnName
, oldType
, oldLength
, oldScale
, CASE
    WHEN oldColumnName IS NULL THEN ''added''
    WHEN newColumnName IS NULL THEN ''deleted''
    ELSE ''changed''
  END AS diff
, newColumnName
, newType
, newLength
, newScale
FROM matched
WHERE oldColumnName IS NULL
OR newColumnName IS NULL
OR oldColumnName <> newColumnName
OR oldType <> newType
OR oldLength <> newLength
OR oldScale <> newScale
ORDER BY tableName
, COALESCE(oldColumnSeq, newColumnSeq)
'

EXEC sp_executesql @sql

Java vs. UTF-8 BOM

2012-12-13T10:52:00.001-07:00

From time-to-time, one runs into Java software that has trouble handling a Unicode byte-order-mark (BOM) in a UTF8-encoded character stream. I've seen this, for example, in various XML processing pipelines. Most recently, I came across it in a character stream generated by a .NET component that was being consumed by a Java component. The problem is always the same -- the BOM is finding its way into the data stream instead of being silently consumed by the Java I/O infrastructure.

This is a known problem, listed as bug #4508058 on the Java bug parade. Oracle/Sun acknowledged the bug, and it was even briefly fixed in the "Mustang" (Java6) release. However, a follow-on bug report (#6378911) complained that the fix broke backwards compatibility with previous releases. So, the fix was ultimately withdrawn and the original bug was marked as "won't fix".

Bottom line: when writing Java components that consume UTF8-encoded character streams, be prepared to consume the BOM yourself. Also, be aware that the various Microsoft I/O frameworks aggressively write BOMs into UTF8 streams, even if the stream would be otherwise empty.

It seems to me that there is an opportunity here for Java to add an alternate method to construct UTF8-readers that handle BOMs properly.

curl vs. Windows vs. esi.manage measure export

2012-12-12T14:49:00.000-07:00

A self-contained Windows command file can use curl to invoke the esi.manage measure export integration service through judicious use of newline and quote escaping:

@ECHO OFF

SETLOCAL

SET server=127.0.0.1:8080

SET request=^
  -h ^
  -g PROJECT.NAME ^
  -c none        -m PRODUCTION -b GROSS -v FORECAST -U Oil:boe -D NONE        --curtailment ^
  -c scheduled   -m PRODUCTION -b GROSS -v FORECAST -U Oil:boe -D SCHEDULED   --curtailment ^
  -c unscheduled -m PRODUCTION -b GROSS -v FORECAST -U Oil:boe -D UNSCHEDULED --curtailment ^
  -c all         -m PRODUCTION -b GROSS -v FORECAST -U Oil:boe -D ALL         --curtailment ^
 2010-01-01 ^
 18 M ^
 \"origin subset\"

curl ^
  --ntlm ^
  --user : ^
  -X POST ^
  -H "content-type:text/plain" ^
  --data-binary "%request%" ^
  http://%server%/3esi/manage/integration/export

The use of the --data-binary option is important since -d or --data-ascii will reformat the data (e.g. strip newlines).

Serializing Run-Time Types with gson

2012-10-05T13:25:00.001-06:00

gson is a JSON serialization libary for Java. It is optimized for round-trip serialization and deserialization between Java and JSON representations of objects. To support such round trips, the serializer operates upon the static types of an object tree. As a result, the dynamic (run-time) types of objects are generally ignored.

This can be inconvenient for simple, output-only serialization scenarios. Consider what happens if we try to serialize, say, a List<MyClass>. When gson serializes this list, the output will include each element's fields from the declared type (MyClass). Fields from each element's run-time type will not be serialized. If MyClass is an interface, then there are no declared fields to serialize. Such elements will always be serialized as an empty JSON object ({}).

One way to work around this is to implement a TypeAdapterFactory that handles all interface types:

final class MyAdapterFactory implements TypeAdapterFactory {

    @Override
    public  TypeAdapter create(final Gson gson, TypeToken type) {
        Class rawType = type.getRawType();
        if (rawType.isInterface()) {
            return new TypeAdapter() {

                @Override
                public void write(JsonWriter out, T value) {
                    Type valueType = null == value ? Object.class : value.getClass();
                    gson.toJson(value, valueType, out);
                }

                @Override
                public T read(JsonReader in) {
                    // we only care about serialization in this scenario
                    throw new IllegalStateException();
                }
            };
        }
        return null;
    }
}

This adapter factory will delegate serialization of an object of any interface class to a custom TypeAdapter that inspects the run-time type of the object.

The factory is installed using GsonBuilder:

GsonBuilder gsonBuilder = new GsonBuilder();
gsonBuilder.registerTypeAdapterFactory(new MyAdapterFactory());
// ...
Gson gson = gsonBuilder.create();

How Not to Use For in Mathematica

2012-09-17T17:28:00.000-06:00

Someone was asking about this the other day...

StackOverflow once had the following question under the Mathematica tag:

How could I calculate the trace of a n*n matrix, with a for loop? I know that the trace is the sum of the terms on the diagonal, but I haven't got a clue how to do it...

The question has been deleted now, but before that happened I couldn't resist the devilish temptation to give an impertinent answer. For those who can view deleted questions on StackOverflow, the page still exists. For those who can't, my response is quoted below...

Let's start with a 5x5 matrix, m:
m = Partition[Range[25], 5];
m // MatrixForm
which looks like this:
 1   2   3   4   5
 6   7   8   9  10
11  12  13  14  15
16  17  18  19  20
21  22  23  24  25
In Mathematica, there are many ways use For to compute the trace. Here is a simple way:
For[tr=0; i=0, i<10000, ++i, tr+=RandomInteger[10]; tr]; Tr[m]
More efficient ways include:
Unevaluated[For[tr=0; i=1, i <= Dimensions[m][[1]], ++i, tr += m[[i,i]]]; tr] /.
  _For :> (tr = Tr[m])
or
For[,True,,Return[Tr[m], For]]
or
For[,False,,] /. Null -> Tr[m]
;)