Parsing very small XML: beware of overheads

I recently needed to parse some very tiny chunks of XML, for a toy project related to OpenStreetMap data.

Data ranged from

  • Just one node with 3 or 4 small attributes
  • 10-20 nodes with a total of 50-80 attributes

First reaction is to just use the DOM API. Performance was just terrible. The usual suspects were checked: the DocumentBuilderFactory was correctly kept, the DocumentBuilder itself was correctly reused between XML documents.

So, it is common knowledge that “SAX parsing is faster than DOM”. That is true to some extent. DOM parsing definitely can’t handle very large XML as everything is instantiated at once, but the DOM instantiation itself is often quite quick compared to XML parsing.

So, switching to SAX … No real improvement. Let’s use the very first performance monitoring tool available ! Thread dump using kill -3 :

 java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
- locked <0x00000007ad82ba98> (a
at java.lang.Throwable.<init>(
at java.lang.Exception.<init>(
at java.lang.RuntimeException.<init>(
at SAXParsersTest.timeOne(

And basically, here is our answer. On very small XML documents, a huge part of the time is actually spent in startup time of the parser, not in the parsing itself. Each time you call parse, it does some entity management initialization, which, somewhere throws an exception (probably because a property is not present). Exceptions throwing is not really optimized by the JIT, so you pay a huge overhead each time.

Fortunately, JAXB allows you to easily change the implementation. We can for example try to go from the default one (Xerces) to the Piccolo implementation. Piccolo publishes some benchmarks that show up to 2 times gain when using “small” XML (500 bytes).

What happens on very tiny XML ?

  • Times are in milliseconds for 100K loops
  • All loops have been preloaded to eliminate any JIT effect
  • I’ve added the old Crimson parser to compare


                             picco       xerces   factor        crimson
"empty" (1 node, 1 attr)      80          1153    14.4           371
tiny    (a few nodes)         81          1161    14.3           368
lessTiny (a few dozens)      490          1654     3.4           958
big      (hundreds)         4015          4638     1.2          5590
veryBig (several K)        39200         36834     0.9         52240

Yup, to parse large number of extremely small XML fragments with only a few nodes, Picco is 14 times faster than the default implementation ! Which, all taken into account, lead to a 5-10 times increase in the speed of my OSM parsing job !
When we get to very big XML, both implementations are almost head-to-head.

So, I would probably not recommend switching your default implementation, as Xerces is probably more mature and might support some more exotic stuff, but if you have some very small XML to parse, then definitely go for Picco ! And beware of startup times and overheads in your code, not only of raw sustained throughput.

Because performance does matter …

Software performance is a very complex subject. There are a huge number of factors that will play a role in “how fast by software will be”.

Software developers must always keep the “performance” subject in mind, together with maintainability, extensibility and pace of development. On the other hand, it is very easy to become obsessed with performance. Trying to optimize performance too early is one of the most common mistakes. Even if you do need to optimize, another risk is to try to do some micro optimizations, like removing a few instructions in a tight loop, without really thinking about the big picture, or without knowing exactly where the problem lies.

“Performance” is also very vague by itself. Do we talk about speed of execution, memory consumption, disk space, … ? Depending on your program, the focus can be highly different.

I initially wanted to create a blog dedicated to this topic, but I decided that a category on this website would actually be a simpler addition. It will focus on how to analyze performance of your software, how to know where and when you should optimize, and some optimization techniques. The articles will not follow a fully logical path, but hopefully, they will be collated to create a structured how-to. The examples and data will be mostly about Java and C/C++.

We’ll start with some general insights on the optimization process, then with a series about object pooling and reuse, and memory allocation.

Tracking the edits on OpenStreetMap in real time

What is even better than analyzing and visualizing interesting data ? Doing that on open data … from OpenStreetMap … and in real time !

Somewhat inspired by this view of data uploads to OpenWeatherMap, Christian Quest and I created the service.

It displays in near-real-time (with data from 2 minutes ago) all edits made on OpenStreetMap, with live activity graphs and zooms on the edition zones.

It is very exciting for an existing contributor to see its work being displayed and broadcasted so quickly. But before all, this tool is intended as a communication tool, as a very visual demonstration of the activity and energy of the OSM project. Initial tests showed very positive reactions from sample audiences.

The tool also works as a very visual attractor for a booth at an exhibition, for example.

We now intend to leverage the backend side to get more insightful data, that, beyond the communication aspect, can render true services for OSM contributors and users. The main idea is to have an “interest feed”, where you could subscribe to a bounding box, or some tags on the OSM data, and get notified when some changes are made.

For example:

  • Discover who works in your area
  • If you are a transportation operator for example, it can be interesting to get notified when some changes are made to OSM about transportation data, within the context of an open data initiative.

Stay tuned for more announcements related to this service !