Problem solve Get help with specific problems with your technologies, process and projects.

Problems converting HTML to other languages

My company wants to convert their html files to xml for two reasons: (1) to enable us to present our documents in any format possible and (2) to allow us to extract parts of our documents to create new custom configurations. My problem is that I can convert our HTML to XHTML and then our own XML for which I have a DTD (and schema) but I am basically "wrapping" the XHTML in the elements I've defined. Converting back to HTML is not a problem as I can use <xsl:copy-of select="."/> and get the old HTML back. However, converting to WML or anything else is not possible right now since the text is full of old html tags: <ol>, <ul>, <li> and even <table>, <tr> and <td> that will not work for WML and probably not for other formats either. Any suggestions?
The crux of the problem is the wrapping of the html elements in the new DTD. Instead of wrapping existing elements you need to add a subset of XHTML's tagset into your own DTD. The subset should be aimed at simple presentation elements that you know you can map to, or directly use, in other formats e.g. p, b, perhaps ul etc. Note that table markup is particularly troublesome as it is often used in HTML to achieve layout effects that are difficult/impossible to re-use in other formats.
This was last published in May 2003

Dig Deeper on Development implications of microservices architecture

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.