Document type definitions strike back!
In two recent tips--namely those for 3/21/2001, entitled "What's up with XML Schemas" and again on 5/9/2001, entitled "A Follow-up on XML Schemas"--I've explained some ways in which XML Schemas can add value when defining certain kinds of XML documents. This week, I'd like to stand that subject on its head and explain why good, old-fashioned, SGML-based Document Type Definitions still have a role to play in defining XML markup and documents. That's not to question or invalidate my earlier tips, but to present some technical areas wherein DTDs continue to surpass Schemas, either in representational abilities, or in ease of creating meaningful, usable subsets.
Murray Altheim, one of Sun Microsystem's pre-eminent SGML and XML gurus, has put together a great website at www.doctypes.org, in part to stem the momentum of opinion away from DTDs and toward XML schemas. In this context, I'd like to quote the opening paragraph from the home page, before I recap some excellent points he makes regarding the continued value--and right to exist--that DTDs should enjoy going forward.
"DTDs have been getting a lot of bad press lately. There seems to be an idea that in order for XML Schemas to live, DTDs must die. I hear voices chanting 'XML Schemas, XML Schemas, XML Schemas will solve all of our problems...' Many of these voices come from the database community, where the concept of 'structured data' is important but the priorities [are] quite different. There is a battle going on for the heart of XML: is it for documents or for data? We keep hearing that XML Schemas will provide the necessary content validation suitable for databases and object-oriented programming. Okay. But you'd almost begin to think that DTDs were no longer suitable for what they were designed to do, and do quite well: validate the markup structure of a document."
Add to validation another important concept--namely, modularization, as practiced with the XHTML specification--and you've got a pretty powerful combination of capabilities. As it happens, the much-vaunted combination of namespaces and schemas didn't buy as much for XHTML modularization as some had hoped. Thus, most of the work in defining meaningful subsets of the XHTML specification (which in turn define the modules into which XHTML markup is divided) has focused on DTDs rather than Schemas.
There's also been quite a bit of discussion on the XML boards and mailing lists lately about using XML for data-centric applications versus document-centric applications. Here again, Schemas with their rich datatypes and abilities to incorporate namespaces appear to be better suited for the data-centric side of the street, whereas DTDs still appear to work better for more document-centric uses of XML.
The point that www.doctypes.org seeks to make is that it's not really a question of DTDs or Schemas, nor a matter of whether Schemas can (or could) replace DTDs. Rather, it's a matter of finding appropriate uses for both approaches and applying each where it makes the most sense. Just as www.doctypes.org is worth a visit, so are the points it makes worth pondering. The long list of entirely useful DTDs on the home page makes a pretty strong case that DTDs are indeed far from dead, and still have a lot to contribute to the mix, especially for what Altheim so circumspectly calls "human-readable documents."
I don't know what kind of documents you like to read, but I will certainly vote strongly for human-readable whenever possible! The point is well taken that conveying document structure and content in readable form is an important part of successful communications.
I want to chew on this further, and will probably find more reasons to compare and contrast DTDs and Schemas in tips to come, as well as explore appropriate uses for each approach. Stay tuned for further musings on this interesting and controversial subject.
Send an e-mail to Ed care of firstname.lastname@example.org if you have questions on this or other XML topics.
Ed Tittel is a principal at LANWrights, Inc., a wholly owned subsidiary of LeapIt.com. LANWrights offers training, writing, and consulting services on Internet, networking, and Web topics (including XML and XHTML), plus various IT certifications (Microsoft, Sun/Java, and Prosoft/CIW).
Great tip? Why not let us know. Email us to share your mind.
XML In Plain English, 2nd Edition
Author : Sandra E. Eddy and B.K. DeLong
Publisher : M & T Books
Published : Dec 2000
From creating a customized markup language to modifying complex documents, this compact guide immediately answers any XML (Extensible Markup Language) question by task, tutorial, or description. Brimming with well-organized, instantly accessible information, this updated edition covers the latest version of XML, including XPath, XPointer, XSL, and more, making it a must for any Web developer who wants to make the jump to XML.