Monitoring the Garbage Collection of a node application

Throw away by https://undraw.co/

Monitoring a running nodejs application in production is a critical activity which allows detecting applications performance degradation automatically without manually testing and looking at it.

Besides monitoring only the total memory utilisation, the monitoring of Garbage Collection performance over time can expose more information about a potential problem in the application which can lead to a memory leak.

Running a node app with the --trace_gc flag will write GC information to the logs at every GC event. Following is the output after running an example app, e.g. node --trace_gc app.js

From the GC logs you can observe that there are two types of entries:

  • Scavenge (new space garbage collection) — an often running and fast GC which is responsible for cleaning up relatively small objects from the new space on the heap
  • Mark-sweep (old space garbage collection) — a GC event which occurs less often because it takes longer to detect which objects can be cleaned up. As the name of the event tells it consists of two actions. First, the Mark-sweep algorithm goes through all of the objects which have survived the Scavenge cycles. And starting from the heap’s root it marks each of the objects in the graph. After marking is done it sweeps objects which cannot be reached from the root object.

How to read the GC cycle entries?

Both types of entries contain data separated by single or multiple spaces.

The Scavenge cycle

[30420:0x103e45000] 99 ms: Scavenge 3.8 (5.7) -> 3.3 (8.2) MB, 0.6 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure

The Scavenge entry contains:

  • The process id and memory address — [30420:0x103e45000]
  • The timestamp of the event from the application start — 99 ms
  • The type — Scavenge
  • Heap used and total heap allocated before GC — 3.8 (5.7)
  • Heap used and total heap allocated after GC — 3.3 (8.2)
  • Scavenge duration and time spent in external callbacks — 0.6 / 0.0 ms
  • And the reason — allocation failure

The Mark-sweep cycle

[30420:0x103e45000] 8168 ms: Mark-sweep 6.4 (9.7) -> 4.5 (7.1) MB, 2.6 / 0.0 ms (+ 0.2 ms in 3 steps since start of marking, biggest step 0.1 ms, walltime since start of marking 13 ms) (average mu = 1.000, current mu = 1.000) finalize incremental marking via task GC in old space requested

The Mark-sweep entry contains the same data as Scavenge with some extra information regarding the marking phase:

  • The time it spent in incremental marking — + 0.2 ms
  • How many steps it took to mark all of the objects — 3 steps since start of marking
  • The longest time for an incremental marking step — biggest step 0.1 ms

Now that you have enabled monitoring for Garbage Collection cycles of a node app what are some of the indications one should watch after.

  • GC cycle duration. Longer Scavenge or Mark-sweep cycles leads to longer latency for application requests because of the node being single-threaded every GC cycle will interrupt the execution of the application
  • The amount of released memory. In a healthy application, every GC cycle should release unreachable objects and the size of the total heap allocated should return to the level of idle application. When we notice that the allocated memory increases along with the application request count increase. And it’s not released after request count decreases. This points out that somewhere in the application is a memory leak which eventually will lead to an application crash.

Conclusion

Tracking down memory leaks is a challenging activity which cannot be done without data and internal insight about what exactly the application is doing to manage its memory.

Node has a built-in mechanism to output information about V8 engine’s Garbage Collection cycles. Which you can use to monitor how often the GC cycles happen, how long it takes and how much memory it frees up by comparing the total heap used before and after the GC cycle.

Thank you for your attention and if you found this post useful and would like to read more about random web development topics, just clap for this article or drop a comment here. As always you can find me on Twitter@andrejsabrickis

Writing JS, TS, Vue, #C, and fostering teams to release customer value n-times a day. Creator of billid.app and writer on abrickis.me