Gilt Groupe Inc. provides an online luxury consumer goods marketplace that offers deep discounts. It quickly grew...
from a $1 million per year business to a $400 million business. A key part of its business strategy has been to make new offers of a limited number of goods in the afternoon, which creates a massive spike in demand.
This creates scalability challenges. To address this problem, Gilt moved from a monolithic Ruby architecture to a higher performance Java microservices architecture. At the JavaOne conference in San Francisco, Adrian Trenaman, senior vice president of engineering at Gilt, based in New York, shared some of the best practices for enterprise architects to address this type of transition. These include improving the testing environment, thinking about service ownership, optimizing the deployment tools and implementing a strategy for aggregating data from different microservices.
Staging versus test-in-production
When you have more than 250 microservices, you need to rethink how you set up the testing environment. In the past, Gilt had five testing environments that worked great for a monolithic architecture. But it was much harder to maintain separate testing environments for hundreds of microservices. As a result, it was much easier for the testing environment to get out of date.
To address this gap, Trenaman recommends that organizations invest in tools to test in production and to improve tools for deploying safely into production. One strategy is to leverage dark canary code, which allows engineers to see changes that are not initially applied to live customer interactions. In this model, a load balancer sits in front of a collection of microservices running on separate servers. The new service is applied to one server, and then expanded out to the others after a testing phase. This reduces the need for upfront testing.
But this approach does not work for all microservices, Trenaman cautioned. Some teams -- such as those responsible for order processing -- test all the time, because a failure might result in accidental billing that angers customers. "We have looked at ways to create minimal testing for those teams," Trenaman said. "If you are thinking about going down this road, it is important to think about how you are going to test."
Ownership is massively important
As the number of microservices grows, it is harder to keep track of which team and individual are responsible for each microservice. Initially, Gilt explored the idea of a microservice for tracking all the others. Trenaman said a much better practice was to set up a shared spreadsheet that details the services, what they do, other services they impact and the ownership of each service.
This becomes an architectural management problem when engineers change jobs or leave the company. Gilt uses teams of four to five engineers who are responsible for writing Java microservice code and putting it into production. These teams are grouped into departments managed by engineering directors. Trenaman said these directors track how staffing changes affect ownership of microservice responsibilities.
Another good practice is to classify the ownership status of microservices using the shared spreadsheet. Any given service has active ownership by individuals on a team and is classified as active, passive or at risk. Using this information, a department can create an ownership pie chart; when the amount of at-risk code in a department reaches a critical threshold, more resources can be devoted to that group.
As part of the transition to Java microservices architecture, Gilt moved much of its application infrastructure to Amazon Web Services. "It is important to have great tooling around deployment when you deploy to the cloud," Trenaman said. "We have about seven different ways to deploy services at Gilt because there are so many things to change."
To ease the deployment process, Gilt created the Ionroller set of tools and made it available as open source software. This makes it easy to roll new traffic over to new nodes on Amazon while keeping old ones running.
In the beginning, Gilt made the mistake of handwriting APIs for Java and Scala. Trenaman said the problem was the client had dependencies on dependencies. This created a lot of problems for feature upgrades.
Trenaman strongly recommends zero dependency clients. This makes it easier to generate clients as a single file, with no dependencies automatically using APIs commonly available on the platform.
Audits and alerting
There are a lot of subtle ways microservices architecture can break without any change in user experience. Even though each service tested out correctly, the complex interactions between them can lead to unforeseen failures or business logic flaws. To address this problem, Trenaman recommends investing in system monitoring and alerting tools that ensure events expected to be true are consistently true.
Gilt created a system monitoring tool, called Cave, which allows engineers to monitor for business rules. For example, the number of orders shipped in the U.S. in the past five minutes should not be zero. By tracking these business-level metrics, Gilt engineers have another way to determine if any given deployment breaks some element of business logic.
Stage data separately
A best practice with microservices architecture is to stage the data for each service separately. This helps reduce the dependencies between microservices, and between microservices and a centralized database. Trenaman said Gilt has not found a good distributed transaction processor that could be used across the board.
The use of distributed databases and data stores also creates a new challenge around data analytics and maintaining a central system of record. To address problem, Trenaman recommends that data engineers develop a strategy for pulling this data together for analytics and reporting
Why you should consider microservices
Microservices pros and cons
Microservices and cloud computing