Node.js Health Checks and Overload Protection

Explaining “Health Checks” and “Overload Protection”.

nairihar
Published in
5 min readJun 21, 2018

--

One of the essential parts of server-side development is to keep servers in a stable condition and not allow overloading operations to happen as they can crash the servers.

I will explain "Health Checks" and "Overload Protection" in this article. Also, I will show solutions for some problems and guidelines on how to do that.

Imagine you have 3 Node servers balanced with the "Nginx" server. The load of the servers is equally divided, so if you have 600 users, every server has 200 clients. Dividing the load equally to each server does not mean you are protected from overload because your work can differ for every user. For example, for user_1, you may read three files, but for user_2, you may need to read nine (3 times more). Depending on the users' requests process can have more complexity, which is the real problem necessary to investigate and resolve. In this case, if your work is different and you will balance requests to the same server, which is overloaded, your server will probably crash.

I will explain "Health Checks" and "Overload Protection" for problems like this.

Health Checks

The Load Balancer sends requests every n(i.e. 5 or 10) seconds to the server, to understand whether the server is able to handle more request or not, if yes, the server marks it as UP and continues to receive more requests from the balancer, otherwise the server marks it as DOWN and the balancer will not send any requests to this server until the balancer sends a health check request again and marks it as UP.

This process is called Health Check.

The request can be a simple HTTP(i.e. GET), Socket, or TCP request.
When the server receives a request, you can do some checks to understand if the server can handle more requests or not, and after that, the server needs to respond to that request. In this case, sending status 200 means everything is fine, and the server can handle more requests. Otherwise, you can send status 503 SERVICE UNAVAILABLE, which means the server cannot handle more requests.

Example of Health Checks

Unfortunately, Open Source Nginx native does not support health checks. For that, you need to install a module which is called nginx_upstream_check_module (not distributed with the Nginx source).

ngx_http_healthcheck_module — sends a request to servers, and if they respond with HTTP 200 + an optional request body, they are marked suitable. Otherwise, they are marked bad.

But I do not want to make it difficult.
Therefore, we can use Nginx alternative Load Balancer — HAProxy.

See the installation part here(you only need the "Installing HAProxy" part).

I will not explain all HAProxy because it will take more time, and our point is not understanding HAProxy. We need to understand how we can make a simple health checking process. I will explain only the essential parts.

Here is a simple server which has two routes, one route for health checking and the other one for us.

Run using command
PORT=8000 node server_1.js

In-Browser, also in the console, you can see thePID number, which shows the process id where the Node.js server is running, so you can understand which node received your request after the balance in HAProxy.

And here is the configuration for HAProxy.
Setup haproxy.cfg file and run the HAProxy service.
Here you can see how to add the configuration file and start the service.

As you can see, we create a server and bind to 3000 port and backend trackers two servers with 8000 and 8001 ports, balancing in the roundrobin way.

  • rise (count): the number of consecutive, valid health checks before considering the server is UP. The default value is 2
  • fall (count): the number of consecutive invalid health checks before considering the server as DOWN. The default value is 3

Make sure that server_1 is running.
Start HAProxy service.

Now you can see health check requests coming to server_1. When the server responds with consecutive two requests with status 200, HAProxy will mark this server UP and balance demands to this server. Before that HAProxy server(http://localhost:3000) is unavailable(try to open the server before two consecutive health check requests). After two successive responses, you can see the result in the Browser (http://localhost:3000). Now, all the requests are going to server_1 because server_2(:8001) is not running.

Before running server_2, let's look at the code and understand it.

This server will send responses with 200 statuses; after 20 seconds, statuses will change to 503. I think everything is simple and easy to understand here.

Let's go to the most exciting part.

Now run server_2 using the command.
PORT=8001 node server_2.js

When two health check logs have passed in two servers, you can open the Browser (http://localhost:3000) and see how to load balance works(refresh multiple times), PID will be different.

After 20 seconds, when server_2 starts responding to the health check 503 status code, after first(as in config, we have fall 1) 503 response HAProxy will mark the server DOWN and stop balance requests to server_2, and the whole load will go to server_1.

HAProxy will try health check requests every 5 seconds, and upon receiving two consecutive(as in config, we have rise 2) 200 status responses, the server will mark as UP, and HAProxy will again balance requests to server_2.

Overload Protection

To check whether the server is overloaded or not and protect against overloads, you need to check some metrics. It also depends on your code logic and what you are doing, but here you can see generic metrics which are essential to check.

  • Event Loop delay
  • Used Heap Memory
  • Total Resident Set Size

Here is a good npm package overload-protection which checks these three metrics. Another good package is event-loop-lag which shows event loop delays.

Using an overload-protection package, you can specify limitations after which your server will not be able to handle more requests. When the configured maximum request limit passes package automatically sends 503 SERVICE UNAVAILABLE.
The package works with http, express, restify, and koa packages.

But if your Load Balancer can send Sockets for health checking and you want to do it with Sockets, then you need to use another package or build one yourself.

Summary

In this article, I have explained how Health Check works in HAProxy and how you can protect your server from overloads. Every server should have at least a health check implementation, which is essential for distributed systems.

Thank you, feel free to ask any questions or tweet me @nairihar

Also follow my “JavaScript Universe” newsletter on Telegram: @javascript

--

--

Sharing valuable insights on the latest tools and trends in JavaScript. nairi.dev