idspopd - Fotolia
If you don't account for and carefully manage data sources during application design, there's a real risk that the application will fail to meet performance, resilience and elasticity expectations. This risk is especially true in analytics applications that draw from multiple data sources.
However, there are five ways to address the problem of multiple data sources in an application architecture: Know what data you need to combine, use data visualization, add data blending tools, create abstracted virtual database services and determine where to host data sources. Let's look at what each of these tasks entails and why they make a difference in application data management.
1. Know what data to combine
The first thing to understand is what you should combine, both in terms of the data sources and the uses thereof. When it comes to multiple data sources, the right management tool depends on the storage formats involved and the goal you have in mind for this data.
For example, a relational database stores most business data and enables the application to perform standard functions, such as queries that draw from multiple physical databases. Whatever the data format, your queries need to align with the format of the data they address.
In terms of mission, narrow the focus according to whether the application uses:
- real-time analytics inquiries that delve into a broad range of data;
- structured queries using some language; or
- direct application access via APIs.
The application's query approach determines whether you need to use a visualization tool, like Google Data Studio, to specifically manage your relational database management system (RDBMS) or program specific design patterns. For instance, if you have real-time analytics, you're best off using something like Data Studio. If you're working with structured queries in non-real time, you want to pay particular attention to how RDBMS queries are permitted to join multiple databases or create their own databases. Finally, if you're using an API, you can develop a storefront or adapter design pattern to shield the APIs from direct use and impose restrictions on data relationships or database creation in the new design pattern you've created. With any choice, you have to manage the performance risks that can arise from improper use and placement of information.
2. Use data visualization
Software managers trying to unify multiple data sources should use a data visualization dashboard to map them out. Data visualization provides value when the user plans to interactively analyze and query information. The approach helps architects get a clear view of their data, and it also helps them manage the relationship between data sources. There are a variety of visualization tools available from vendors such as Google, IBM and Oracle.
Consider data visualization a primary tool to manage multiple data sources in an application, even if you're not specifically looking at dynamic data visualization to gain business insight. Using a visualization tool is a great way to lay out data source relationships, test the value of cross-source analysis and even assess how database placement in the network or cloud will affect application performance. Anyone who does any form of query or analytics work should have a visualization tool.
3. Turn to data blending tools
Data blending tools are useful for analytics applications. These tools turn multiple data sources into a unified data source through a join clause that lets you define multisource data relationships and reuse them as necessary. Data blending is an increasingly common feature in visualization tools.
As with other tools selected to manage multiple data sources, data blending capabilities must align with the characteristics of the given data sources. For example, if you have information stored in a nonstandard database structure, this specific format might dictate which data blending tool you can use.
4. Create virtual database services through abstraction
It's surprising how many companies use data studios for interactive analytics and then make discrete API calls for the same sort of data when they write applications. Don't fall into this trap. Forcing an application to process too many database formats can cause performance to suffer. This practice can also create a scenario where data source correlation isn't consistent for every user.
Abstract database services facilitate application access to multiple data sources. Because these services define -- and hide -- the complex way that information can connect across sources, they encourage standardized information use and reduce development complexity. They also create a small number of services you can use to identify the specific data sources and determine what users are doing with them. For compliance purposes, this data abstraction is a critical function.
5. Decide where to host data sources
Finally, consider where you will host data sources and whether the network connections are sufficient. Data sources are almost abstract in nature, because you access them through a logical name call, rather than network or data center addressing. Because the information's location is typically hidden, it may not be obvious how accessing it from a specific data store will affect application performance.
Public cloud access to data sources is a prime example of the difficulties related to application performance optimization. When cloud applications access data in the data center or another cloud, traffic charges and network transit delays can mount up. Applications that access multiple data sources magnify this issue, and the abstraction performed in data blending can further exacerbate the problem by hiding the specifics of each data source. Therefore, when you design cloud applications, either host the data in the cloud, or abstract the database access to a service and run that service local to the data.
Application orchestration and deployment tools, such as Kubernetes for containerized applications, can connect database resources to application components, but it's still possible to misuse those data sources. Always take care to maximize the value and efficiency of data-centric applications in order to improve cost and performance.