Microservices at Constant Contact – a view from the trenches

Part 1 of a 2 part series. See part 2 here.

Microservices are a hot topic of discussion these days.  The tech world is agog with stories of Netflix’s and PayPal’s architecture and how microservices helped them get to where they are today. A few weeks ago, Stefan Piesche wrote an excellent article on “Microservices at Constant Contact” in which he delves into the rationale for adopting microservices and the benefits that accrue once adopted.   In this article, I provide a view from the trenches that draws upon our experience implementing microservices at Constant Contact.  Although ours was a greenfield project, we had already paid our dues to a monolithic architecture working on other products and learned from our experiences.  While we have several services in our organization, in this article I discuss our Trackable Coupon application, which allows our customers to setup coupons and distribute them to their contacts who can then claim them.

Defining Boundaries

Coming from a monolithic world, we had to figure out the boundaries for our services.  In our case, our domain model, business capabilities and scalability requirements went hand in hand.  We settled on:

  • a service to handle functions performed by our customers –  creating and configuring coupon campaigns
  • a service for our customers’ customers – claiming and redeeming coupons

The latter service, where customers claim and redeem coupons, has higher scalability requirements.  Once we had decided on the services, we identified our resources and operations and defined the endpoints.  We also applied the Single Responsibility Principle to verify that our services were conforming to the function for which they were intended.  We needed some additional services to handle other functions such as generating  landing pages.

Consistent Endpoint Naming Structure

When you are working with a number of services (as in our case) it pays to define a consistent and uniform pattern for structuring endpoint URIs.  While REST does not stipulate a standard for URIs, we adopted the commonly used hierarchical structure that makes it intuitive for the end users.  Here are some examples:

Get all the coupons for a customer:
https://<service_name_url>/v1/customers/{customer_id}/coupons

For this URI, we use the {customer_id} to define the scope of the coupons to be retrieved.

Get a specific claim for a given coupon:
https://<service_name_url>/v1/customers/{customer_id}/coupons/{coupon_id}/claims/ {claim_id}

The {coupon_id} narrows the scope amongst the customer’s coupon collection and the {claim_id} identifies the specific claim amongst the collection of claims for the coupon.  Note that we are using a different service.

This pattern can be extended to other types of campaigns such as events and surveys.

Start small

We started with a single endpoint and implemented a thin vertical slice of the service.  We implemented a GET initially and mocked the data to get the end point working.  Once we were able to access the payload with a rest client like Postman, we then implemented a simple POST and integrated with Cassandra for storage.  Implementing this initial vertical slice helped us shake out any integration issues before we had progressed too far into the project.  It also allowed our QA team to set up their test infrastructure and write initial tests against these endpoints.

Validation, Error Handling and Compensation

Errors are a fact of life – data validation errors, business logic errors have to be checked and reported.  We realized early on that having a consistent error reporting mechanism would be critical to maintain our sanity.  Nothing is more irritating for a consumer than inconsistent error handling across multiple services.  With this in mind, we agreed upon a convention for the HTTP status codes and the responses for errors.  For example, we decided we would not overload HTTP codes to mean something other than what they are intended for.  While this seems to be a no brainer, we have run into APIs where the response code could not be relied upon and we had to actually inspect the payload to figure out whether the operation succeeded.  We also settled on a standard error response.  Here is an example of an error response:

[
{"error_key": "error.coupon.minimum_purchase_amount_violation",
"error_message": "Discount Amount 12.00 cannot be greater than Minimum Purchase Amount 10.00"},
{
"error_key": "error.coupon.end_date_past",
"error_message": "Date conflict encountered. End date 09-10-2015 is in the past"
}]

The response has a HTTP Code of 400 (bad request) and is an array of objects that comprise of an error_key and error_message.  In this case, the request payload failed the business logic validation.  Providing errors in a consistent format like this for all our services allows the end users to parse errors easily.  Internally, it allows us to use a shared library for parsing our errors as we call other services.

JSON Schema

We also implemented JSON schema validation for data validation, so the service catches these type of errors up front.  See  “Use JSON schema to validate Web Service Requests” by Joe Simone for a good description on this topic.

Idempotence

We ran into an issue with distributed transactions wherein if a service made calls to multiple services and one of them failed, we had to roll back any changes made prior to the call that failed.  We had to ensure that endpoints were available to make these calls and all our calls were idempotent so that we did not leave resources in an inconsistent state.

Platform Service

Once we had written our services, each handling their specific task, we realized that operations in certain services followed a specific pattern.  In our domain this involves responding to a campaign (claiming a coupon, registering for an event, responding to a survey etc) involves persisting the contact data, persisting campaign specific data, posting tracking activities, sending a confirmation email etc.  These activities also had to follow a specific sequence.  To keep this consistent as well as avoiding duplicate code across multiple services, we created another service that we call the platform service that handles the orchestration of these services.

Synchronous vs Asynchronous Calls

Currently these are all synchronous calls. We plan to revisit this since some of these calls can be asynchronous calls.  Posting tracking activities as well as sending confirmation emails can be asynchronous calls since these have no direct bearing on the response.  Please note that having a platform service does not preclude individual services from calling each other.  The idea is not to create a hub and spoke architecture or route all calls through the platform service, but rather to direct those calls that require coordination, which adds the complexity of error handling and compensation.

Testing

We place a high premium on testing our code.  Developers work closely with QA to ensure that all facets of our codebase have adequate test coverage.  As we developed our services, we tested the following:

  • Unit tests to test our components
  • Integration tests with Cassandra
  • Integration tests to test our integration with other services
  • Cucumber tests to test our endpoints.

A few months ago, I came across this excellent article that describes Microservice testing architectures.  We found that we were aligned with the strategies outlined in this article and,borrowing from the architecture described therein, our testing architecture can be delineated as shown below:

ms2

Tests are run during software builds, and a build fails if any tests fail.  We also monitor our code coverage using Sonar.  We adopted the “Three Amigos” pattern wherein at the start of each story, the developer, QE and product owner meet together to review the story and come up with tests for the use cases.  This helps us clarify the stories and foster a shared understanding of the requirements, which results in a set of comprehensive tests.  The challenge of course is for the developers to close the loop with QE when changes occur or when features are modified.  A failing test will point these out, but the real danger is in deploying changes without adding corresponding  new tests.   Our QE team is proactive in this regard; they have the developers review their test suite at multiple intervals during the development process.

Next post:  I discuss automation and continuous deployment, troubleshooting, monitoring and metrics, performance, automated documentation and more in the next post on this topic.

Share your thoughts and experiences implementing microservices in the comments section below. 

Comments

  1. Excellent insight…well written. Thanks for sharing.

Leave a Comment