The ability to capture, categorize and manage what generally looks like arbitrary data on the Web is a difficult undertaking. The problem programmatically is the lack of semantic and context understanding associated with the data. Most XML data requires an accompanying DTD (data dictionary) or schema to properly qualify, convey and interpret the content. Today most of the data extraction is fairly fragmented and application-specific. Typically, the target would be an RDBMS and not necessarily XML (yet), though it is growing in importance. One of the basics of XML is that just by the virtue of having the data in an XML format makes it transportable and shareable between applications. The real difficulty comes in carefully choosing the meaning of the XML tags and their content. This is the base problem no matter where the data originates and whether it is structured or un-structured.
There are a number of large vendors in the market that support RDBMS and XML transfer solutions. Vendors that come to mind are Data Junction, Informatica, Ascential Software, DataMirror (and a whole bunch of other niche players). Informatica and Ascential are the clear leaders though - they are driving the bulk of the extraction/transformation/loading (ETL) market.
The "Big Three" database vendors (IBM, Oracle, and Microsoft) seem to have an increasing impact on this market, since they are delivering more and more ETL functionality bundled with their products. In particular IBM and Oracle have more substantial XML solutions (not sure about Microsoft). Of course, the solutions of the big database vendors are geared toward easy integration with their respective technologies.
Dig Deeper on Service-oriented architecture (SOA)
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.