Designing Microservices: Characteristics of a Service

From an engineering perspective systems grow in three ways: there are more features, those features grow more complex, and there are more requests to those features. To address this complexity — like most refactoring — we extract interfaces and subdivide the system into smaller, isolated, more manageable parts that can be composed together. A successful refactoring creates a less brittle system that is easier to develop as a whole.

Microservices is a popular concept to break down monoliths into smaller, more manageable pieces. Microservices are really just services with a specific, well-defined scope. Services are defined by their API and the events they publish. APIs and events represent contracts and these contracts give us the building blocks for composition. We must decide how big these building blocks should be: very fine grained services offer the most independence, but require more interaction. Course grained services allow more to happen internally at the risk of coupling.

How APIs and Events Help Systems

It is not always possible — nor practical — to call a service for data using its defined API contract. In this case the service emits an event, the client consumes the event, and then chooses to act on that event. An event, like an API, is a contract. The consuming service could transform and store the data in that event, trigger another action, or ignore the message. The benefits of events are three-fold. First, producers of events have no knowledge of their consumers. New consumers can be deployed without changes to producers. This differs from an API interaction where a client must explicitly call a service. Second, consumers can fail without affecting producers. This lessens coupling and increases resiliency. Third, events exist as facts which occurred at a specific time. Once emitted events do not change; instead new events are emitted which adds new facts to the system. A service can choose to provide as much information as it wants to. It could be just an identifier, where consumers ask the service API for more information. In general events should carry a reasonable payload describing the event which occurred and how consumers can act on that information independently. This opens many opportunities, specifically around data analysis and streaming.

It is important to note the source system owns the data and defines the events it emits; the consuming system must realize it has a copied, unauthoritative version of that data which may be stale. You should not attempt to reuse events across systems.

Events are most often used to separate reads and writes, so reads can be optimized for queries, and writes can be optimized for storage. This is often called Command Query Responsibility Segregation. You could write data to a db, but read that data from elasticsearch, or some optimized read-store, possibly combined with other data. Materialized views work in a similar way, albeit internally within a database. The separation of reads and writes also creates a failure domain: one can operate without the other.

Services own some set of functionality. Consumers call a service API explicitly to interact with that functionality. Events allow data to flow between services asynchronously and independently, as a result of something happening. A service should define an API contract and be explicit about the events — via contracts — it emits.

Characteristics of a Service

There are four characteristics which engineers should look for when defining a new service, whether it is a feature or resource service:

  1. The service must be the definitive source of truth for data and functionality it is intended to cover (the bounded context). This means owning not only the write path but protecting its data from other services which could bypass its API and event contracts. For instance, if a service owns blog information all other services which uses blog information should receive it through the blog API or some BlogPublished/BlogUpdated event. All blog writes go through the blog service. The service can choose to store its data however it sees fit, and change that storage as long as it adheres to its external contracts. It might use a key/value store, or change to a blob or sql store, or use a mix of both.
  2. The service should prevent coupling through its service and event contracts. This simplifies abstractions. The service should prevent other consuming services from being aware of the system outside of its contracts. A service should protect other consuming services from change internal to itself. Lastly a service should operate with only the data it owns and the explicit service dependencies it requires. If you need to authorize a user, there should be an authorization service which abstracts the details — and dependencies — of how authorization occurs. Additionally if a service needs data from another system, it should go through a service or event contract. It should not go directly to a sql table; if it does, a schema change on the table could break the consuming service, which is a sign of tight coupling. A service that sits between the table and the consumer prevents coupling.
  3. Given an understanding of the business and direction for that business, will this service add value and open up new opportunities? This allows engineers to liberate tightly-coupled internal pieces of the system creating new opportunities. For instance, being notified of business-related changes to the system can be beneficial in many ways. If a video streaming service can publish a stream of ‘video viewed’ events we can create new services which leverage that stream of data. We can forecast system load, analyze popularity, and trigger notifications.
  4. Engineers must understand what it means for this service to be considered available. A batch processing service might be available if it successfully completes a task once a day within a four-hour window. A backend service might be available if it can respond with a correct 2xx, 3xx, or 4xx response within 50ms at the median, and 500ms at the 99th percentile. We define a Service Level Objective for the service, saying out of all requests, it can serve some percentage of them within its availability — for a reasonably high QPS service, this is often the percentage of all minutes which the service falls within its availability target. This is usually measured with some number of 9s, as in ‘four nines’ means 99.99% of the time (4.38 minutes of unavailability per month!).

Services should minimize the blast radius of a change. This means that they generally should be releasable independently, and maintain forward and backward compatibility to allow for rollbacks and changes in their dependencies. Designing services and determining service boundaries around the four characteristics helps foster a healthy service ecosystem.

Entity services should keep validation to a minimum. Validation requires context, which is more appropriate in a higher-level feature service. In our video streaming service a video in some contexts must have a title and a license. In other contexts these are not required, so we must not prevent a video from being created without a license or title.

Service, API and Event Design

Once you have a framework for determining what a service should be you can design the API and event contracts around that service. Kafka is often used as a platform for events; Kubernetes and gRPC is a popular combination for microservices; and an API Gateway can present disparate services under one umbrella to the outside world.

Code. Create. Conquer.