Chaos Monkey is a popular open source tool developed by Netflix that takes reliability testing to new heights....
The idea behind Chaos Monkey testing is to deliberately kill random nodes across the system at regular intervals to assess whether the system can survive despite these failures. The Chaos Monkey testing principle can help evaluate the reliability of microservice-based applications, but rather than intentionally kill nodes, architects should focus on the interruption of services.
As one service fails, other dependent services could stall or fail in a ripple effect. This delivers a bad user experience. There are many ways to deal with this issue, and when used in combination, they can help architects design more resilient systems.
Fallback services matter
Replicas and fallbacks are not just for infrastructure components, but for mission-critical services as well. Consider creating an alternate, bare-bones service that can take on the load if the default service fails. This is especially useful for services such as billing and transaction processing for e-commerce applications.
Setting up a fallback service could be as simple as routing traffic away from the failed service. Before flipping the Chaos Monkey testing switch, though, you need to ensure backup services for critical processes are ready to take over.
Building resilient infrastructure
With modern container stacks, it is now possible -- even easy -- to automatically restart failed containers and set up autoscaling container clusters. This automation helps ensure resiliency in the infrastructure layer.
At the networking level, shortening timeout limits will ensure services are quickly rerouted to a fallback service after a failure. This helps better optimize the system for performance.
Having persistent data storage for containers is also essential. When a container fails, its stored data is lost unless you configure persistent storage volumes. Persistent storage ensures that, even if a service fails, it can be easily resumed with the old, stored data.
When all efforts still result in lost data, a disaster recovery (DR) tool can ensure you always have access to your data in the event of mass failures. DR tools and services are worth purchasing if you need true data resilience.
Whether it's the service layer or the infrastructure layer, you can build more resilient applications by employing the Chaos Monkey testing principle in your microservice applications.
Dig Deeper on Microservices testing and QA
Related Q&A from Twain Taylor
It's not always easy to develop Salesforce apps with microservices. Developers and admins should evaluate CI/CD tools that support microservices ... Continue Reading
Containerization tools have taken on a critical role in managing communication between microservices, and two Kubernetes-based tools in particular ... Continue Reading
There was plenty of movement around Kubernetes in 2017, and there seems to be more to come in the year ahead. Twain Taylor makes predictions about ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.