1 | ## Understanding the analysis
|
2 |
|
3 | JavaScript is a single-threaded event-driven non-blocking language.
|
4 |
|
5 | In Node.js I/O tasks are delegated to the Operating System, JavaScript functions (callbacks)
|
6 | are invoked once a related I/O operation is complete. At a rudimentary level, the process of
|
7 | queueing events and later handling results in-thread is conceptually achieved with the
|
8 | "Event Loop" abstraction.
|
9 |
|
10 | At a (very) basic level the following pseudo-code demonstrates the Event Loop:
|
11 | `while (event) handle(event)`
|
12 |
|
13 | The Event Loop paradigm leads to an ergonomic development experience for high concurrency programming
|
14 | (relative to the multi-threaded paradigm).
|
15 |
|
16 | However, since the Event Loop operates on a single thread this is essentially a shared
|
17 | execution environment for every potentially concurrent action. This means that if the
|
18 | execution time of any line of code exceeds an acceptable threshold it interferes with
|
19 | processing of future events (for instance, an incoming HTTP request); new events cannot
|
20 | be processed because the same thread that would be processing the event is currently
|
21 | blocked by a long-running synchronous operation.
|
22 |
|
23 | Asynchronous operations are those which queue an event for later handling, they tend to be
|
24 | identified by an API that requires a callback, or uses promises (or async/await).
|
25 |
|
26 | Whereas synchronous operations simply return a value. Long running synchronous operations are either
|
27 | functions that perform blocking I/O (such as `fs.readFileSync`) or potentially resource intensive
|
28 | algorithms (such as `JSON.stringify` or `react.renderToString`).
|
29 |
|
30 | To solve the Event Loop issue, we need to find out where the synchronous bottleneck is.
|
31 | This may (commonly) be identified as a single long-running synchronous function, or
|
32 | the bottleneck may be distributed which would take rather more detective work.
|
33 |
|
34 | ## Next Steps
|
35 | - If the system is already deployed, mitigate the issue immediately by implementing
|
36 | HTTP 503 Service Unavailable functionality (see *Load Shedding* in **Reference**)
|
37 | + This should allow the deployments Load Balance to route traffic to a different service instance
|
38 | + In the worse case the user receives the 503 in which case they must retry (this is still preferable to waiting for a timeout)
|
39 | - Use `clinic flame` to generate a flamegraph
|
40 | + Run <code class='snippet'>clinic flame --help</code> to get started
|
41 | + see "Understanding Flamegraphs and how to use [0x](https://www.npmjs.com/package/0x)" article in the **Reference** section for more information
|
42 | - Look for "hot" blocks, these are functions that are observed (at a higher relative frequency) to be at the top the stack per CPU sample – in other words, such functions are blocking the event loop
|
43 | - (In the case of a distributed bottleneck, start by looking for lots of wide tips at the top of the Flamegraph)
|
44 |
|
45 | ## Reference
|
46 |
|
47 | - Load Shedding
|
48 | + Express, Koa, Restify, `http`: [overload-protection](https://www.npmjs.com/package/overload-protection)
|
49 | + Hapi: [Server load sampleInterval option](https://hapi.dev/api/#-serveroptionsload) & [Server connections load maxEventLoopDelay](https://hapijs.com/api#-serveroptionsload)
|
50 | + Fastify: [under-pressure](https://www.npmjs.com/package/under-pressure)
|
51 | + General: [loopbench](https://www.npmjs.com/package/loopbench)
|
52 | - [Concurrency model and Event Loop
|
53 | ](https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop)
|
54 | - [Overview of Blocking vs Non-Blocking](https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/)
|
55 | - [Don't Block the Event Loop (or the Worker Pool)](https://nodejs.org/en/docs/guides/dont-block-the-event-loop/)
|
56 | - Understanding Flamegraphs and how to use 0x: [Tuning Node.js app performance with autocannon and 0x](https://www.nearform.com/blog/tuning-node-js-app-performance-with-autocannon-and-0x/)
|