Using microservices to build modern, 12-factor applications can be a double-edged sword. They make the development...
process much more streamlined and agile, but microservices complicate performance management. By multiplying the number of application components, microservices significantly increase the number of interactions and dependencies with highly varying contributions to overall performance. Implementing end-to-end performance monitoring and tracing is never an easy task, which explains surges in the application and performance management software market.
The problem with microservices
Microservices create a much larger component footprint than developers may be used to. A microservice architecture creates the application equivalent of a network effect in which the number of potential component interactions increases geometrically with the number of microservices used. And it's not just the number of interactions that matter, but the variable nature of them. Microservices typically get their inputs from other microservices. Their outputs, in turn, feed others. However, since those outputs aren't necessarily uniform, the results of one microservice can significantly affect the workload of another. For example, if a microservice produces data processed by others, the amount and complexity of the data will change downstream performance. It's not necessarily the volume of output, but its nature. If one microservice creates database query strings, for example, the resulting queries can significantly affect the completion time and, hence, overall response time.
Beyond the performance implications, the integration of multiple microservices and their API connections adds risks since each integration is also a potential point of failure. Furthermore, applications that call third-party services use a distributed or hybrid cloud architecture can suffer from variable network bandwidth when using instances over the WAN or on different LANs.
Although the challenges are significant, the microservices performance management problem is solvable. We will examine some of the tools and techniques that can help and offer some recommendations to make sure microservices create end-to-end performance problems that are comparable to death by a thousand cuts.
Tools and techniques
When using microservices, performance analysis is a two-level problem comprising both granular telemetry of individual microservices and comprehensive, end-to-end measurement of all applications. Microservice telemetry is used to identify internal bottlenecks, inefficiencies and bugs, while application-wide monitoring seeks to see performance from the user's perspective and identify problems within the microservice ecosystem and its connections.
Characterizing macro-scale performance in a microservice application is akin to analyzing website responsiveness. That's because it requires distributed application tracing to break down macro-transactions into components and see how each contributes to overall latency or application throughput. Given the maturation of web application analysis, it's not surprising that component-level tracing has been pioneered by those building browser applications. These have evolved into tools like Google Stackdriver Trace, Spring Cloud Sleuth, SolarWinds TraceView or Zipkin that can trace transactions end to end to show delays for individual components.
Getting the most detailed picture of aggregated microservices application performance requires collecting as much data as possible. Most of the relevant performance data comes from aggressive logging, both internal microservice operations and for the application as a whole. Turning that data into useful information requires aggregating it into a single repository where it can be mined for performance insights and anomalies. This can be accomplished with software that is designed to pore over terabytes or more of log data, like Loggly, Splunk or Sumo Logic.
Although log file performance analysis can be ad hoc and manual, it's best to automate wherever possible. But keep in mind that automation requires the creation of clearly defined expectations for application and microservices performance, as well as set performance limits that trigger alerts when they are exceeded. These allow DevOps teams to proactively investigate performance problems before they become apparent to users.
In order to avoid performance issues down the road, it's important to build performance management in from the start of a microservice-based application development project. In this section we list a few specific suggestions for building that in:First, comprehensive microservices performance management requires measuring and analyzing infrastructure, including networks, servers and containers. It also requires analyzing code and the combined, end-to-end application. You'll need data collection and tools in place at each level. As mentioned earlier, distributed tracing tools are useful in analyzing overall application performance and component bottlenecks, while existing infrastructure management and monitoring systems should be able to generate the necessary system and network performance data.
The next thing to consider is that microservices need integrated telemetry and monitoring code to enable granular, so-called in situ monitoring. These functions should log performance-relevant data and events that can be aggregated and analyzed for performance-robbing behaviors.
Also, consider doing synthetic load testing before production to simulate real-world performance and find bottlenecks. Tools like New Relic provide a synthetic API that can be used to build test scripts. Cloud services are ideal for building test beds since you can emulate real-world conditions without spending a fortune on infrastructure.
It's also important to fully consider all microservice interactions and the different failure scenarios they create. Have some contingency planning exercises where you work through the implications of bottlenecks or failures in various microservices.
Finally, use automation to find critical microservice interactions. A Chaos Monkey approach which randomly fails or throttles various components of a microservice application is a good way to determine overall performance sensitivity to different conditions. Netflix pioneered the use of random changes to large distributed systems with its Simian Army, a suite of tools that it has open sourced with scripts and configuration instructions available for Amazon Web Services. Although designed for virtual machines like Elastic Cloud Compute, the techniques and code can also be applied to a containerized, microservice environment. There is also software for Docker, called Pumba, that automates random failures for resiliency and performance testing and that could be easier to adapt to containerized applications.
Learn how microservices can bring agility to SOA
An exclusive guide to working with Agile and DevOps
How to make microservices and cloud applications fit together