November 10, 2003

XML Best Practices

The right way to understand XML is not as a technology but as a technique. The constellation of XML standards is just a toolset, and not every XML standard is useful. The important thing is to to be able to recognize strong and weak uses of XML.

There is one overarching "why" behind XML technology: it enables an important, obvious technique, but one that is too often ignored. This "XML technique" provides an excellent framework for deciding how and when - or when not - to apply any specific XML technology.

XML is Both Human- and Machine-Readable

All the elaborate standards surrounding XML come down to one basic value:

XML is both human- and machine-readable

By providing a way for a wide range of data to be exchanged directly by computers in a human-readable way, XML dramatically lowers the cost of development. For example, if something is wrong with a network of systems that exchange XML files, an ordinary developer can view or alter the actual messages easily. All it takes is an ordinary text editor.

Human Readable Machine Readable
Editiable in "vi" Decoding rules are formally proscribed
Text data is marked up using mnemonic tags Strict rules for names, namespaces, hierarchy
Hierarchical structure is fully self-describing Shape formally verifiable using DTD or XML Schema

The same qualities translate to making it easy to understand how to write programs that produce XML - starting from an example document, it is trivially easy to write a "printf" program to write out XML text. And human readability also makes it easy to write programs that consume XML - the self-describing hierarchical layout of a document shows you exactly how to traverse an XML tree data structure. There is no prequisite reading of thick standards manuals to write a simple program to process an XML document.

XML is valuable because it is a rigorous machine data format that is easy for people to understand. Do not get distracted by the hundreds of other features provided by the various XML standards. No other XML feature is as important.

How to Recognize Good XML Technique

It is not hard to recognize when good XML techniques are being used in a development effort. Here are the two things that you see:

  1. More than one person is reading the same XML in a vanilla editor. When you see this, it probably means that your next new programmer will also be able to understand your critical data too.
  2. More than one program is reading the same XML using different parsing code. When you see this, it probably means that your next legacy system can be made to understand your precious data too.

XML is being used well when it is being used both to help both people and computers to communicate with each other.

On the other hand, it is also easy to recognize XML misuse:

  1. Nobody knows how to read the raw XML. If it scares programmers to think about using a plain text editor to look at your XML, it probably means that your XML was made purely for computer consumption. It is probably not doing much better than a bytestream would.
  2. Only one program can read the XML. If nobody seems to be capable of writing a another parser for your XML format, then it has become just yet another opaque data format for you. Why even use XML?

That's right. Since the whole purpose of XML is to ensure that multiple people and multiple programs can work with your data, you sacrificing most of the value of XML as soon as you annoint a solitary "XML expert" programmer or a lone "XML processor" program for your XML data. Lose that one programmer, or obsolete that one system, and you will understand the costs.

Let your XML Show

Robustly interoperable, safe data exchange is all about fully visible, understandable data. It must not be confused with "encapsulation", which is the object-oriented technique for taming complexity by exposing behavior and hiding data. XML technique is exactly the opposite: it reduces complexity by showing all the data and hiding any underlying behavior.

Whereas an object-oriented programmer would say "I don't care how the data is implemented, I just care what the interface does," a data-oriented XML programmer says "I don't care what the implementation does, I just care what the data contains."

The upshot: to apply good XML technique, let your XML show.

Posted by David at November 10, 2003 05:08 PM

is this still work?

Posted by: skanfd at January 9, 2021 11:48 AM
Post a comment

Remember personal info?