Weakly Reachable: 2013

2013-10-08

Inspecting the SQL Server Plan Cache

While investigating an insidious case of SQL Server query plan cache poisoning, I looked for a way to see the cached plans. Here is what I used:

WITH data AS (
  SELECT
    c.usecounts AS uses
  , s.max_worker_time AS max_time
  , CAST(
      p.query_plan.query('
        declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
        for $d in distinct-values(//ColumnReference/@Database)
        order by $d
        return data($d)
      ') AS NVARCHAR(MAX)) AS databases
  , CAST(
      p.query_plan.query('
        declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
        for $t in distinct-values(//ColumnReference/@Table)
        order by $t
        return data($t)
      ') AS NVARCHAR(MAX)) AS tables
  , c.objtype
  , t.text AS sql
  , p.query_plan
  , s.last_execution_time AS executed
  , s.execution_count AS runs
  , s.creation_time AS created
  , c.plan_handle
  , s.sql_handle
  , s.query_hash
  , s.query_plan_hash
  FROM sys.dm_exec_cached_plans  AS c
  LEFT JOIN sys.dm_exec_query_stats AS s
    ON s.plan_handle = c.plan_handle
  CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) AS t
  CROSS APPLY sys.dm_exec_query_plan(c.plan_handle) AS p
)
SELECT TOP 100 *
FROM data
ORDER BY max_time DESC

Note the final two hash columns. Their presence is to help identify when different plans have been cached for a single SQL statement that has been executed with different parameter values.

Cached plans can be cleared using DBCC FREEPROCCACHE or DBCC FREESYSTEMCACHE. When benchmarking SQL, it can be useful to discard SQL Server's internal buffers. This can be done without restarting the server thus:

CHECKPOINT
GO
DBCC DROPCLEANBUFFERS

On a related note, the query hint OPTIMIZE FOR UNKNOWN can be used to essentially eliminate parameter-sniffing -- reducing non-deterministic optimizer behaviour.

2013-08-09

Language Constructs

C programmers, whose language maps directly to machine instructions, scoff at the bloat, tediousness and inefficiencies of ultra-high-level languages like C++.

Java programmers, whose language constructs map directly to the output of paste -d '\n' <(man cobol) <(man simula), rejoice that they do not have to deal with the bit-twiddling and low-level trivia of machine languages like C++.

LISP programmers, whose language ~~maps directly to~~ is the metaphysical substrate of the cosmos, are confused by these opinions since they see no distinction between high-level and low-level code.

Perl programmers dismiss the whole debate as academic since their language maps directly to the structure of a human mind (named Larry).

2013-06-04

Using Internal Java Implementation Classes

The build of our Java 7 application failed today. Someone had referenced an internal Java implementation class within our code, namely com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter. We use Eclipse, and this reference compiled and ran fine within the IDE. However, the PDE build would fail saying cannot find symbol. The difference can be explained when one realizes that the IDE compiles using the Eclipse compiler, but the PDE build uses javac.

However, one might still be stumped as IndentingXMLStreamWriter can be found within rt.jar. It turns out that javac does not actually compile directly against rt.jar. Instead, it references the file lib/ct.sym. This file is essentially a zip archive containing all of the "blessed" classes. IndentingXMLStreamWriter is not in that file.

It is best to avoid using internal implementation classes, but there is a work-around if one insists. The undocumented compiler argument -XDignore.symbol.file will use rt.jar directly.

The details of this mechanism can be found by inspecting the OpenJDK source files com.sun.tools.javac.main.RecognizedOptions and com.sun.tools.javac.file.JavacFileManager.

2013-05-10

Port Redirection Under Windows 7

Today I wanted to perform some benchmarking of an SQL application. All of the work up to this point was on a single machine that hosted both the database and the application. As a result, SQL network traffic was flowing through the Loopback interface. I wanted to get a better idea of the effects of real network latency by putting the bytes out on the wire. I was too lazy to install the software or the database on another machine, so I wanted to set up a port reflector on a second Windows 7 machine.

First, I tried using netcat. Well, strictly speaking it was the nmap version, ncat:

ncat -l -p 9999 --keep-open -c "ncat win7host 1433"

This listens on the local port 9999 and feeds every incoming request to a new ncat process that sends the incoming packets to the target host on the SQL Server port 1433. This worked, and introduced network latency. A lot of latency. The processes being created for each connection introduced so much delay that throughput was similar to Internet latency. That was a little too much latency for my purposes.

As an alternative, I used the Windows netsh command:

netsh interface portproxy add v4tov4 listenport=9999 listenaddress=192.168.0.158 connectport=1433 connectaddress=win7host

This performed without any significant overhead introduced by the proxy itself. The port redirection was removed using:

netsh interface portproxy delete v4tov4 listenport=9999 listenaddress=192.168.0.158

2013-01-30

Java SAXParserFactory vs. XMLFilter

Using Java 7, I had problems trying to inject an XMLFilter into a SAX pipeline. My code looked something like this:

SAXParserFactory saxFactory = SAXParserFactory.newInstance();
saxFactory.setNamespaceAware(true);
saxFactory.setValidating(false);
saxFactory.setSchema(schema);

XMLReader reader = saxFactory.newSAXParser().getXMLReader();

XMLFilter filter = new NamespaceRewriter();

filter.setParent(reader);
filter.setContentHandler(handler);

filter.parse(source);

The filter was a trivial extension of XMLFilterImpl that rewrite some of the document namespaces. However, this code would fail. As can be seen, the XMLReader was configured to perform XML schema validation. The failure was that the reader would bypass the installed SAX filter when performing this validation. Using a debugger, I was able to verify at run-time the reader's ContentHandler was correctly set to be my filter. But the reader was apparently performing the schema validation prior prior to invoking its ContentHandler.

I was able to work around this problem by inserting an explicit validation stage in the pipeline instead of relying upon the validation baked into the XMLReader built by SAXParserFactory:

XMLReader reader = XMLReaderFactory.createXMLReader();

ValidatorHandler validator = schema.newValidatorHandler();

XMLFilter filter = new NamespaceRewriter();

filter.setParent(reader);
filter.setContentHandler(validator);
validator.setContentHandler(handler);

filter.parse(source);

The changes are emphasized.

I would say that this is unfortunate behaviour in the Java 7 implementation, but I lay the blame squarely on the SAX API. The SAX API makes it devilishly difficult to build reader/filter chains correctly. In particular, it is almost impossible to link independently assembled subchains into a single larger chain. The API supports easy delegation of parse calls, but chaining content handlers is messy. A filter is required to destructively replace the content handler of the next reader/filter in the chain with an augmented content handler. If that next component has built-in behaviour that is not expressed in the content handler, then that component cannot be wrapped successfully (e.g. the problem at hand). Even if the XMLReader above had expressed the schema validation as an augmentation to the content handler, how would a filter recover it? Presumably getContentHandler() is supposed to return the unaugmented handler -- so how is one supposed to get the augmented one?

2013-01-10

SQL Server vs. EXISTS

Using Microsoft SQL Server, I had a query that retrieved rows from a view of some complexity, i.e. something like this:

SELECT * FROM vwComplex WHERE ...

This query ran basically instantaneously, even though the view involved numerous UNION ALLs and joins on coalesced keys. My sample query criteria selected ~100 out of the view's ~80,000 rows.

In the general case, I needed a query that returned a single bit indicating whether the specific criteria will select at least one row. I tried this:

SELECT
  CASE WHEN EXISTS (
    SELECT * FROM vwComplex WHERE ...
  ) THEN 1 ELSE 0 END

I presumed that this statement would run faster than the original unadorned query since it can stop executing as soon as it finds the first row. My presumption was incorrect -- the query ran for a long, long time.

In this case, at least, it turns out that it is faster to run the full query than to perform an existence check:

SELECT SIGN(COUNT(*)) FROM vwComplex WHERE ...

No surprises here, of course. SQL query plan generation remains a mystical exercise.

Weakly Reachable