As organizations move forward with their service-oriented architecture (SOA) implementation plans, they often find themselves grappling with significant, long-lived problems of data and semantic integration among a wide variety of data sources. The mismatch between the data representation that one application provides and the data representation that another application expects forms the core of this long-lived problem.
This data integration problem is particularly troublesome for SOA implementations because it limits the loose coupling between service providers and consumers, unless there is an effective way to decouple the provider and consumer's data representations. The need to access information of so many disparate types from so many disparate sources forms the semantic integration challenge that organizations must deal with today.
Effective service composition means effective semantic integration
What makes effective data and semantic integration a challenge is twofold: first, the representation of information and the information itself are often bound tightly together; and second, such information frequently lacks context. Developers often think not of the data themselves but rather the structure of those data: schemas, data types, relational database constructs, file formats, and so forth—structures that don't pertain directly to the information at hand, but rather our assumption of what the data should look like.
Furthermore, companies frequently make independent decisions about technologies for different aspects of their business, and then concern themselves with tying together these disparate systems at a later date. In addition, many industries are prone to acquisition and merger activity, leaving their companies a complex checkerboard of differing technology implementations. As a result, there is rarely central decision making on technology adoption, and heterogeneity always rules the business that adapts to meet its changing market. Thus, companies continually find themselves with the challenge of meeting their integration requirements in a heterogeneous environment of continual change.
The difficulties with solving the semantic integration issues limit a company's ability to respond to change quickly, and thus limits SOA's ability to provide business agility. Adherence to standards insures, to some degree, the loose coupling and composition of services and data exchange mechanisms, but there is no guarantee that the data exchanged with an individual Service will be universally comprehensible. Even when organizations use metadata standards such as XML to format messages, the schema of one Service might not match the schema of another, and fields with the same meaning might use differing tags.
In order for data to flow unimpeded in a SOA implementation, therefore, we must liberate service providers and consumers from tightly-coupled knowledge of other's data formats. Services should be concerned with the semantic meaning of the data, not the structure of the data. The issue here is therefore one of loose coupling. While we might loosely couple application interfaces, if we deal with data the same way that we have always done—by imposing the data structures of service providers on service consumers, the result is every bit as tightly coupled as previous architectural approaches. In order to deliver on the promise of seamless data integration, we must transcend loosely coupling the application interface and in addition provide loose coupling at the semantic level.
A service-oriented approach to data integration
If semantic differences exist between service consumers and providers, some additional work needs to be done in order to overcome those differences and provide the veneer of loose coupling required in SOA. A Service-oriented approach to this problem is to implement these data mapping and transformation capabilities as services that act as intermediaries between consumers and providers, freeing the provider and requester to focus on the semantic meaning of the data rather than on the format of those data.
The use of data transformation services improves loose coupling by separating the concerns of service providers and consumers. The builder of a
specific service can focus on the key functionality of that service, and the composer of a Service-Oriented Business Application (SOBA) that implements a service-oriented process can focus on process-level design concerns. The transformation services isolate the data transformation concerns, allowing personnel skilled in transformation techniques to focus on developing the data transformations. This approach leads to more maintainable solutions and allows organizations to focus each component on what it does best, and enables the organization to utilize its resources more effectively.
When an application or service must transform data from one format to another in order to achieve some desired data integration goal, the most common and straightforward approach is to embed data transformation logic directly within the application, as the figure below illustrates:
Using an embedded transformation
Embedding transformation logic within applications is a natural object-oriented design technique, because in the object-oriented world, applications encapsulate internal logic, including transformation logic. Nevertheless, there are significant disadvantages to embedding transformation logic within an application:
- Other applications or Services have no practical way of reusing the embedded transformations
- It's impossible to offload the CPU overhead that the embedded data transformations cause
- The transformation logic is tightly coupled to the business logic within the application, reducing the flexibility and reusability of the applications
- The transformation logic is handled programmatically vs. declaratively through metadata, further impeding the flexibility of the application.
While in many cases, data transformation logic may be so specific that it is unlikely that another service or application will need it, and so well-built that it doesn't consume many processing resources, the real problem of embedded transformation logic is that over time, companies end up with so much purpose-built, hard-coded, and tightly-coupled transformation logic scattered across the enterprise. This scenario points out the danger of hiding transformations from the enterprise: if we hide the cost and complexity of transformation within the cost of tightly-coupled applications, there is no way to gauge its true cost.
An alternative to embedding transformation logic within an application is to use an external transformation Service to handle the task. A transformation service, loosely-coupled in a SOA implementation of course, accepts incoming data in one format, and outputs data with the same semantic meaning in another format, as shown in the figure below.
In addition to externalizing data and semantic integration logic, use of an external transformation service makes it possible to offload CPU-intensive data transformations to systems optimized for such operations. While the overhead of using a service hosted on a remote system can incur the overhead of passing data to and from a remote server, the loosely-coupled, service-oriented approach of externalizing transformation logic through service interfaces allows organizations to independently scale the number of servers hosting the transformation service as necessary, without impacting existing Service implementations.
In addition to the benefits of loosely-coupled transformation logic, transformation services offer an organization considerable cost savings by reducing the hodgepodge of data integration and transformation tools spread across the enterprise. Licensing costs for these tools add up, and reducing the number of such solutions that a business has to deal with decreases hassles and cost. In addition, through the use of transformation services, organizations gain increased visibility of the transformation activity within the enterprise. Increased awareness of such transformation activities provides valuable insight, both for capacity planning and, if necessary, for undertaking data migration efforts. Disjoint transformation tools can't provide these insights.
The ZapThink Take
SOA has not changed the fact that organizations must contend with a wide range of diverse, disparate data sources on a daily basis in order to run their daily operations, make informed decisions, and work with their customers, partners, and suppliers. Companies continually find themselves with the challenge of meeting their data and application integration requirements in a heterogeneous environment of continual change.
The semantic integration challenge remains as an obstacle to fully realizing SOA's promise of business agility, however. Organizations struggle with how to liberate the meaning of data from the structure of its representation, so that information can flow seamlessly between the services that power the business. Data transformation services can address the challenge of data and semantic integration. By separating transformation into a separate service, it's possible to handle data transformation declaratively, increasing the flexibility of the transformations.
Transformation services are most useful in groups of SOBAs that share services amongst themselves. Orchestration logic can route information through a transformation service, both before passing it to a Service, and before processing a response from a service. From the perspective of the SOBA's author, a transformation service is simply another service, and data within the application flow seamlessly from one Service to another. The transformation services simply perform the necessary steps to make the process work properly.
(This ZapFlash is an excerpt of a recent White Paper on this topic, Transformation Services for SOA)