Gracefully shutdown an RPC service

RPC servers usually have several components: a thread pool (or green threads), local resources that are shared between requests (logging, metrics, throttling data, cache etc), and remote resources (databases and dependency systems). It is usually hard to cleanly shutdown a RPC server because the different types of resources involved.

It is great to be able to cleanly shut it down, though. We would exit in a known and consistent state. Clients will be notified and receive nice error messages, instead of time outs. Remote resources will be notified and released. We can make sure metrics and logs are collected.

Stop incoming requests

The first step is to gradually stop requests from incoming. We start with telling our clients that we are shutting down, and they should seek service elsewhere. This is sometimes known as the lame duck mode. In this mode we continue to serve requests. Notifications can simply be attached to response messages.

Cooperative clients should gradually stop sending more requests. After a certain time after lame duck mode is enabled, we return immediate errors to all clients. Then we start dropping incoming network connections. At this point we will have no new tasks to do.

Clear queues and thread pools

Since there will be no new tasks, we should focus on finishing all tasks that are still in the system. Make sure all items in queues are processed, and all scheduled closures in thread pools are executed. Or if the type of work allows, we can simply drop thread pools and all closures pending.

Shutdown is not always supported by all thread pools. If shutdown is not supported, we can work around that by wrapping resources that require shutdown in reference counts, to make sure they are not released too early.

If green threads are used instead of thread pools, we can practice good code hygiene and obtain join handles on all green threads we spawn. Or we could fall back to reference counts.

Another option is to have a global counter of pending tasks and closures. Once the counter reaches zero, we can be sure that all closures are executed. There are performance implications. The contention on incrementing the counter will put a lot of pressure on CPU cache. That should not be a problem unless your workload is really really hungry on performance or latency sensitive.

Releasing resources

After all tasks and green threads are either dropped or completed, we can start releasing shared resources. Leave some threads behind for the clean up and wait for all of them to finish. Drop queues, thread pools. Flush logs and metric counters. Close database connections. Release memories. We are done!

Organizing resources for graceful shutdown

Organizing resources carefully can make graceful shutdown easier.

Wrap globally shared resources in shared pointers. If possible, put globally shared resources into the thread local storage of threads in thread pools. Create a global RequestContext that knows how to fetch those resources. All tasks can use the same RequestContext. The alternative is to create a new shared pointer for each task, which has slightly worse performance.
For the global resources that do not need special care at shutdown (e.g. database connections), they can be simply dropped.
Create objects for cleaning up globally shared resources that require a proper shutdown (e.g. subscription of notifications). Think of the object as JoinHandle, but for resources instead of threads. This object should be connected to the shared pointer mentioned above, so that it can wait for or double check that all references are dropped.
Print logs when resources are correctly released. Log errors if the wait is too long, or if a resource is accessed after it is released. This can be done in the RequestContext mentioned above.