The freedom of "not knowing" how XML markup will be used
There's an ongoing, and fascinating debate raging in the XML community right now about the supposed efficiencies associated with binary versus textual data formats, based on the presumption that binary is always faster than text. Check out Leigh Dodds' "XML Deviant" column at http://www.xml.com/pub/a/2001/04/18/binaryXML.html. If you read that piece with some care, you'll see that it just ain't necessarily true that binary data representations are always faster to read into a program than text representations, even when--as is the case with XML markup--that text data must be parsed, and a corresponding document object model constructed.
In the course of mulling over the incredible temptation that developers often feel to create fast, streamlined (but ultimately human-unreadable) formats for data, Dodds exposes some wonderful points that not only validate many of the advantages I've been claiming for XML since I started writing these tips, but that also expose some startling revelations:
- XML's biggest appeal and utility comes from its human readability. Developers and markup jockeys can read the markup directly to figure out what's going on. This greatly lowers the probability that parser, interpreter, or runtime handling errors can systematically mangle such markup.
- Numerous XML developers report anecdotal and more formal evidence that in many cases, parsing and building the DOM for an XML document occurs faster than internalizing some binary representation of the same data.
- "Lazy" parsing and evaluation techniques--where only referenced nodes in a document are parsed and the DOM is creating piecemeal only for parsed elements--sometimes means that only a small portion of a document needs to be processed. This can't help but be faster than internalizing an entire binary representation of a whole document (and very few binary readers are smart enough to use lazy techniques).
- Compact text-based markup--iMode's stripped down version of HTML is a good example--can often be processed much faster than more complex binary data. Software is very good at processing plain text, and that's all good markup contains.
- Deriving efficient, compact, and useful binary encodings is much more difficult than designing efficient, compact, and useful markup languages. Effective design of binary formats is nowhere near as well documented or understood as effective design of markup languages, either.
- Because they're anything but transparent binary formats are also inherently closed forms of notation. All too often, they're proprietary to boot. Properly constructed XML markup by contrast is open, transparent, readable, and seldom proprietary. Which horse would you rather bet on?
- The benefits of an open, transparent markup are easier to adopt, especially within groups of developers from multiple organizations, because issues related to "not invented here" and "my code is better than your code" become entirely secondary, if not irrelevant to ongoing development work. Of course, this is as much a benefit of the standardization of XML markup as it is a benefit of XML itself, but that's a good thing, too.
- Best of all because XML markup carries structure and organization for data along with the data it represents, it remains open to the broadest possible interpretations and uses. Quoting from Walter Perry, Dobbs makes the point that "once on the Internet you no longer know how, or by whom (or even what), your data will be processed, so you cannot make any assumptions about how it will be used." And finally, there's the notion that once captured in XML form, information can be transformed into any format imaginable--including arbitrary binary formats, in fact--so that delivering data captured in XML to just about any consumer is eminently doable.
Maintaining information in the most open, expressive, and descriptive format enables it to be used in ways that may not have existed when an XML document was created, let alone part of its original design objectives. By leaving the door open for all comers and all possibilities, native XML enables the broadest possible applications. By maintaining information in human readable form, that information remains accessible to anyone who must work with XML documents, rather than having to rely on a program's ability to render or interpret its contents to make sense of them.
Given all these things, it's no wonder that so many organizations, industry groups, associations, and other special interest groups are rushing to standardize XML encodings for their data universes. With all the foregoing advantages to help them, they'd be crazy not to do so!
Send an e-mail to email@example.com if you have questions on this or other XML topics.
Ed Tittel is a principal at LANWrights, Inc., a wholly owned subsidiary of LeapIt.com. LANWrights offers training, writing, and consulting services on Internet, networking, and Web topics (including XML and XHTML), plus various IT certifications (Microsoft, Sun/Java, and Prosoft/CIW).
XML How to Program, 1/e
Author : Harvey M. Deitel, Paul J. Deitel, Praveen Sadhu, T. R. Nieto and Ted Lin
Publisher : Prentice Hall
Published : Dec 2000
This comprehensive guide to programming in XML teaches readers how to use XML to create customized tags and addresses standard custom markup languages for science and technology, multimedia, commerce, and other fields. Includes a concise introduction to Java, as well as cutting edge topics such as XQL, SMIL and VoiceXML as well as a real-world e-Commerce case study. Also includes a complete chapter on Web-accessibility that addresses VoiceXML. Provides hundreds of valuable programming tips.