alan dipert RSS

github / twitter / resume / email
Nov
17th
Wed
permalink

An XML Rant

This rant is brought to you by my reaction to Daniel Lemire’s post, “You probably misunderstand XML”

In school I took a class called “Data Interchange” or something, that taught primarily XML and the ecosystem around it.  We learned about XSD, RELAX NG, XSLT, XPath, JAXB, SOAP, and lots of other things I thought to be loads of horrendous bullshit at the time.  I hated that class.

I made a special effort to go, though.  I was on a personal mission to point out every counterexample to, failing of, and argument against XML that I could.  ”It’s the illegitimate half-brother of the holiest representation of all time, the S-expression, and it’s slower than shit to boot!” I cried.

I’ve learned a lot since then.  The biggest thing I’ve learned is that it’s easy to dismiss any technology with cultural baggage on religious terms.  It’s also easy to conflate your hatred of some technology, like SOAP, with some other technology, like XML, that is only loosely or incidentally related.  Without assessing pragmatically to yourself whether SOAP sucks because it uses XML, or SOAP just plain sucks, you have not adequately justified your hatred.

The more you work without asking yourself these questions, the more likely you are to commit the cardinal sin of anyone in the computing profession who gives a shit about what they do - use the wrong tool for the job.

What’s much harder, and much more useful, is to take a pragmatic approach to determining a technology’s deficiencies and capabilities.  I still don’t do this enough.  This was one of the themes of Rich Hickey’s “Hammock Time” talk at the first Clojure Conj.

But it’s not what led me to later realize the legitimate use cases for XML.  I digress.

What I later realized about XML as I programmed more is that the intrinsic structure, semi-structure, or non-structure of the data you are working with is only part of the question, and only leads you to a conclusion - well, opinion, mostly - on how to best represent it.  The more complicated the domain, the more complicated your opinion, and the hairier the ultimate representation.

As you bang out your implementation, the less intuitive but really insidious part of the question lurks: “Will people in the future working with this thing I created be able to use it and modify it for their own needs?  Will they know my intent well enough to improve upon it, or determine if some variant of it is “valid” in the sense that it is true to my original opinion, on which other consumers of this thing may depend?”

For domains that are simple in the ontological sense, it seems really dumb to use XML, because that question doesn’t seem worth addressing.

That’s because you’re confident other people could sit down, read a few your examples, and be comfortable adapting and extending your stuff.  JSON and YAML are great for applications that fit this scenario, especially when you don’t have to worry about transport to or from places outside of your control.  Like on a web app, when you’re doing Ajax stuff with your own backend.

Sometimes though, you need to transfer data between applications - applications that are coded by different people, who might have different perspectives of the underlying domain that the data being transported is derived from.  And that’s when you need to look into XML, because of all the formats out there, it has the most tooling and documentation around writing schemas.  XML schemas let you express your understanding of the domain, and how you chose to interpret pieces of it, declaratively, in an almost human readable format.  More readable than BNF in my opinion, anyway.

It’s easy to know when you’re doing it wrong.  Is your code littered with assertions about the structure of data coming from some outside source?  Are there failure cases for your application that can be triggered by malformed data sneaking in?  Do you have tons of unit tests around this?

If you do, you might be comfortable.  You might feel good.  But what about the other guys?  If you have multiple developers working multiple code bases, all of which depend on some common interchange format, then the data on Bob’s box running application Q actually depends on the unit tests running on Sally’s box running application Z.  You can commit another cardinal sin and duplicate the tests, version and integration-test all of these apps together, or you can suck it up and investigate a transfer format that supports schemas.  Depending on your transport and other requirements, this might be XML, or it might be something like Thrift.

Schemas are not the answer to every question, and, like any tool, they can be the wrong one for the job.  But in my limited experience, there’s a place for XML and it is easy to overlook because of your own cultural bias.

Comments (View)
blog comments powered by Disqus