In a recent tip in the area of data governance in service-oriented architectures, I briefly described the means and methods whereby organizations can take stock of their data assets and whip them into some kind of intelligible, documented and manageable shape. In this tip, I'd like to dig a little deeper into what experts call the data integration lifecycle that's involved in imposing and maintaining data governance, and describe the many and vital roles that XML can play in this process (for which purposes numerous tools vendors already offer commercial packages that do these very things).
To begin, let's review the seven steps that comprise this data integration lifecycle, whereby data is made visible, its value assessed and its forms and uses better understood:
- Access: Although its forms may be poorly understood and documented and its sources somewhat murky, data must be used to have any kind of life or meaning. It comes from many places, including legacy applications and systems, databases, modern applications, XML messages of many kinds and numerous types of documents (spreadsheets, project plans, text documents and so forth).
- Discovery: This involves bringing all data sources out into the open, particularly documenting the uses and structures of poorly understood or described sources. This is also the point at which data semantics (patterns or rule that emerge from its structure and use) and quality issues should be noted and flagged for further action.
- Cleansing: Data is cleaned up to make sure it's correct, accurate and complete. Clean-up can involve detecting and correcting errors, supplying missing elements and value, enforcing data standards, validation and purging duplicate entries.
- Integration: Imposes a single comprehensive understanding of data across all systems and applications so that fragmented sources are combined and transformed to eliminate discrepancies in data structures, definitions, and representations. This often means resolving inconsistent use of and meanings for identical terms across different contexts.
- Delivery: Correct, relevant data is made available in proper form, in a timely manner, to all users and applications that need such access. This might mean responding to queries that result in single records or small answer sets to delivering entire data sets for trend analysis or enterprise-wide reporting. This step also addresses needs for data security, availability, privacy and compliance requirements related to access and use ( HIPAA for medical records, and so forth).
- Development and management: This is where XML-based toolsets really come into their own and enable those who manage data, business analysts, architects, developers and managers to work together in creating a comprehensive set of data integration rules, processes, practices and procedures, thereby capturing and implementing all the substantive work done in the five preceding steps. This step also tackles issues related to performance, scalability and reliability needs for key enterprise applications and services.
- Auditing, monitoring and reporting: Once its semantics and uses have been captured, omissions remedied, errors corrected, and quality examined and assured, ongoing observation and analysis are required to keep the data clean, correct, reliable and available. This part of the process makes it possible to flag potential issues as they occur and to cycle them back through this lifecycle to make sure they get resolved. Auditing also helps to make sure that data remains visible, under control, and able to guide future changes and enhancements.
At all steps in this path, XML can play a major role. This is especially true in steps 1-4, where XML representations and metadata can shed light on data structures, semantics, usage patterns and behavior rules. It also can help to reconcile potentially conflicting views of the realities that such data models. This is where organizations can make huge gains, by making poorly understood or documented aspects of their data assets conscious, correct and part of their governance processes.
About the author
Ed Tittel is a full-time writer and trainer whose interests include XML and development topics, along with IT Certification and information security topics. Among his many XML projects are XML For Dummies, 4th edition, (Wylie, 2005) and the Shaum's Easy Outline of XML (McGraw-Hill, 2004). E-mail Ed at firstname.lastname@example.org with comments, questions or suggested topics or tools for review.