How to get started with XML
If there's one question I field constantly from people interested in XML, it might be stated simply as "How do I get started with XML?" or perhaps "Where should I start, if I want to learn XML?" In a world where an ever-increasing number of XML applications, where DTDs and schemas compete for metalanguage responsibility, and where a basic understanding of markup languages and syntax drives everything else, a few basic recommendations come in handy. They'll get you started down the road to XML knowledge, and give you some sense of the fundamentals, while also inevitably leading into other topics of immediate interest:
- The XML Information Set (XML Infoset) is essentially a succinct rendition of an XML document's parse tree (or what it looks like after being digested and formulated in a navigable collection of content elements organized according to the document's formal structure). Understanding what this data structure contains (the content and organization of any XML document), how to navigate it (walk the tree by any number of methods), how to validate it (compare the parse tree to an XML schema or DTD and search for compliance or errors), and how to use it (process a parse tree's nodes with specific intent) are perhaps the most important aspects of understanding what XML is and how it works.
The Infoset also leads quite naturally to the XML document object model (DOM) and to an understanding of XPath (which creates notation to identify elements in a parse tree or DOM). These technologies enable documents to be addressed, navigated, and manipulated. In the simplest of terms, the Infoset helps define a data model for XML, so that XML documents may be instantiated, interpreted, and their components accessed and used. The XML Information Set: https://www.w3.org/TR/xml-infoset/
- Although the W3C is pushing XML Schemas into the fore, a working knowledge of how standard or customized XML markup may be referenced or defined is essential. Today, this still means using either SGML document type definitions (DTDs), XML Schemas (a more complex alternative to DTDs that defines XML documents using XML markup), or another proprietary Schema language. Either way, you will have to get to know one of these validation technologies. This is the W3C's Schema page: https://www.w3.org/XML/Schema. A working knowledge of namespaces to reference standard XML vocabularies is likewise important. That's because all these elements conspire to define well-formed and valid XML, be it standard or customized, and what kind of content that XML markup can capture, which must be the cornerstone of any foray into the XML dimension. Here's a namespace recommendation: https://www.w3.org/TR/REC-xml-names/
- Doing something with XML content once captured remains the primary motivation for using XML, so some form of presenting that content is highly desirable. Although Cascading Style Sheets (CSS) work with XML and permit XML documents to be rendered in a Web browser, limitations on CSS support in Web browsers, and issues with how those browsers handle native XML argue strongly for transforming XML into HTML for online delivery. That's where the XML Stylesheet Language (XSL) Transformations (aka XSLT) come strongly into play, because they not only provide a way to transform XML document contents, but can also supply additional text as part of those transformations. (Note: XPath is critical in this context, because it provides document navigation that XSLT uses to select the document contents it transorms).
As it happens, one of the biggest limitations of the CSS-XML combination is that it can only show what's in a document. It's often the case that extra text (or special text handling to create tables of values, for example) helps deliver more readable XML documents, or can label their contents in a meaningful way. That's where XSLT really shines. At some point in the future, the XSL Formatting Objects (XLS-FO) will also help manage native presentation of XML markup online, but it's not yet ready for prime-time use. You can find info on XSLT at: https://www.w3.org/TR/xslt
With a strong grounding in these core XML technologies, and the concepts and terminology they embody, would-be XML developers will be well on their way to realizing their ambitions to make effective use of this technology. As a way to prepare oneself for the many other applications, forms of markup, and complexities that XML contains, this approach also gives aspiring XML professionals the competence in core technologies and terminology they will need to become effective practitioners.
Send e-mail to Ed Tittel at firstname.lastname@example.org if you have questions on this or other XML topics. With a good foundation in XML terminology, document definition and navigation tools, the basic information set, and document transformation, you'll be well-equipped to figure out which way up is in your particular case, and start working your way into a proper state of knowledge, if not the proper state of grace.
Ed Tittel is a principal at LANWrights, Inc., a wholly owned subsidiary of LeapIt.com. LANWrights offers training, writing, and consulting services on Internet, networking, and Web topics (including XML and XHTML), plus various IT certifications (Microsoft, Sun/Java, and Prosoft/CIW).
Did you like this tip? Love it or hate it, why not drop us a line and let us know?
XML for Dummies, 2nd
Author : Ed Tittel
Publisher : Hungry Minds
ISBN/CODE : 0764506927
Cover Type : Soft Cover
Pages : 408
Published : Apr 2000
Now updated with coverage of new applications and the latest standards, this is the ideal beginners guide to XML (eXtensible Markup Language), the widely heralded successor to HTML. Readers will discover how to use this powerful, flexible language and its specialized applications to format all kinds of complex data for the Web. The CD-ROM comes with software, code, Internet links, and more.