Our OpenResty® open source web platform is known for its high execution speed and also small memory footprint. We have users running complex OpenResty applications inside embedded system devices like robots. And people have been observing significant memory usage reduction when they migrate applications from other technical stacks like Java, NodeJS, and PHP. But still, sometimes we may want to optimize the memory usage of a particular OpenResty application which incurs large memory usage and/or leaking memory. Such applications may contain bugs or deficiencies in its Lua code, Nginx configurations, and/or 3rd-party Lua libraries and Nginx modules.

To effectively debug and optimize memory usage or leaking issues, it is always beneficial to have a good understanding of how OpenResty, Nginx, and LuaJIT allocate and manage memory under the hood. Our commercial product, OpenResty XRay, can also automatically analyze and troubleshoot a wide range of memory usage issues in any OpenResty applications without any modifications or compromises, even in an online production environment. This post is the first of a series of articles explaining memory allocations and management in OpenResty, Nginx, and LuaJIT, with real world sample data and graphs provided by OpenResty XRay.

We start with an introduction to the system level memory usage breakdown for Nginx processes and then take a look at memory allocated and managed by various different allocators on the application level.

On The System Level

In modern operating systems, processes request and utilize virtual memory on the highest level. The operating system manages the virtual memory for each process. It maps those actually used virtual memory pages to physical memory pages backed by hardware (like DDR4 RAM sticks). It is important to note that a process may request a lot of virtual memory space but may merely use a small portion of it. For example, a process can always successfully claim, say, 2TB of virtual memory from the operating system even though the system may only have 8GB of RAM. This will go without any problems as long as that process does not write to too many memory pages of this huge virtual memory space. It is this potentially small portion of the virtual memory space that will actually get mapped to the physical memory devices and thus is what we really care about. So never panic when you see large virtual memory usage (usually named VIRT) in tools like ps and top.

The small portion of the virtual memory which is actually used (meaning written data to) is usually called RSS or resident memory. Well, when the system is running out of physical memory and some part of the resident memory gets swapped out to the disk[1], then this swapped-out portion is no longer part of the resident memory and becomes the “swapped memory” (or “swap” for short).

Many tools can provide the size of virtual memory, resident memory, and swapped memory for any process (including OpenResty’s nginx worker processes). OpenResty XRay can render a pretty pie graph like below for an nginx worker process.

Virtual Memory Usage Breakdown for an Nginx Worker Process

In this graph, the full pie represents the whole virtual memory space claimed by the nginx worker process. The Resident Memory component indicates the resident memory usage, which is the memory pages actually used. And finally, the Swap component that is not shown here represents the swapped out portion.

As mentioned above, we usually care most about the Resident Memory component. That is usually the absolute focus of any memory usage optimizations. It is also quite alarming if the Swap component shows up, which means the physical memory is no longer enough and the operating system may get overloaded by swapping memory pages in and out. We should also pay attention to the unused portion of the virtual memory space, however. It may be that we allocate some Nginx shared memory zones way too big and they may someday bite us badly when they actually get filled up with data (in which case they would become part of the Resident Memory component).

We will cover resident memory in more detail in another article. Next let’s take a look at memory usage on the application level.

On The Application Level

It is much more useful to inspect memory usage further on the application level. In particular, we care about how much memory currently used is allocated by the LuaJIT memory allocator, how much is used by the Nginx core and its modules, and how much by Nginx’s shared memory zones.

Consider the following pie graph generated by OpenResty XRay when analyzing an unmodified nginx worker process of an OpenResty application.

Application-Level Memory Usage Breakdown for an Nginx Worker Process

The Glibc Allocator

The Glibc Allocator component in the pie graph indicates the total memory size allocated by the Glibc library, which is the GNU implementation of the standard C runtime library. This allocator is usually invoked on the C language level through the function calls malloc()realloc()calloc(), and etc. This is usually also known as the system allocator. Note that the Nginx core and its modules also allocate memory via this system allocator as well (an important exception to this is Nginx’s shared memory zones which we will discuss shortly below). Some Lua libraries with C components or FFI calls may sometimes directly invoke the system allocator, but it is more common for them to utilize LuaJIT’s allocator. The OpenResty or Nginx build may choose to use a different C runtime library than Glibc, like the musl libc. We will extend our discussions on system allocators and Nginx’s allocator in a dedicated article.

Nginx Shared Memory

The pie’s Nginx Shm Loaded component is for the total size of the actually used portion of the shared memory (or “shm”) zones allocated by the Nginx core or its modules. Shared memory zones are allocated via the mmap() system call directly and thus bypassing the standard C library’s allocator completely. Nginx shared memory zones are shared among all its worker processes. Common examples are those defined by the standard Nginx directives like ssl_session_cache, proxy_cache_path, limit_req_zone, limit_conn_zone, and upstream’s zone. It also includes shared memory zones defined by Nginx’s 3rd-party modules like OpenResty’s core component ngx_http_lua_module. OpenResty applications usually define their own shared memory zones through this module’s lua_shared_dict directive in Nginx configuration files. We will cover the Nginx shm zones’ memory issues in great detail in another dedicated article.

LuaJIT Allocator

The HTTP/Stream LuaJIT Allocator components in the pie represent the total memory size allocated and managed by LuaJIT’s builtin allocator, one for the LuaJIT virtual machine (VM) instance in the Nginx HTTP subsystem and one for the VM instance in the Nginx Stream subsystem. LuaJIT also has a build option[2] to use the system allocator but it is mostly just used with special testing and debugging tools only (like with Valgrind and AddressSanitizer). Lua strings, tables, functions, cdata objects, userdata objects, upvalues, and etc, are all allocated by this allocator. On the other hand, primitive Lua values like integers[3], numbers, light userdata values, and booleans do not require any dynamic allocations, however. C-level memory blocks allocated by LuaJIT’s ffi.new() calls on the Lua land are also allocated by LuaJIT’s own allocator. All the memory blocks allocated by this allocator are managed by LuaJIT’s garbage collector (GC), and therefore it is not the user’s responsibility to free up those memory blocks when they are no longer needed[4]. Naturally, these memory objects are also called "GC objects". We will elaborate on this in another article.

Program Code Sections

The pie’s Text Segments component corresponds to the total size of the .text segments of both the executable and the dynamically loaded libraries for the target process mapped into the virtual memory space. The .text segments usually contain executable machine code.

System Stacks

Finally, the System Stacks component in the pie refers to the total allocated size of all the system stacks (or “C stacks”) in the target process. Each operating system (OS) thread has its own system stack. So multiple stacks only occur when multiple operating system threads are used (note that OpenResty’s “light threads” created by the ngx.thread.spawn API function is completely different from such system-level threads). Nginx worker processes usually have only one system thread unless its OS thread pools are configured (via the aio threads configuration directive).

Other System Allocators

Some users may hook up 3rd-party memory allocators with their OpenResty or Nginx processes. Common examples are tcmalloc and jemalloc which speed up the system allocator (like malloc). They help speed up small and naive malloc() calls in some Nginx 3rd-party modules, Lua C modules, or some C libraries (OpenSSL included!), but they are not very useful for the parts of the software which already use a good allocator like Nginx’s memory pools and LuaJIT’s builtin allocator. Use of such "add-on" allocator libraries introduce new complexity and problems which we will cover in great detail in a future post.

Used or Not Used

The application-level memory usage breakdown we just introduced is a little bit complicated when we analyze whether it is for used virtual memory pages or unused ones. Only the Nginx Shm Loaded component in the pie graph is for the virtual memory pages actually used. Other components include both kinds. Fortunately memory allocated by the Glibc allocator and LuaJIT allocator are usually used anyway. It does not make any big differences most of the time.

For Traditional Nginx Servers

Traditional Nginx servers are just a strict subset of OpenResty applications. These users still see system allocator memory and Nginx shared memory zone usage, among other things. OpenResty XRay can still be used to directly inspect and analyze such server processes, even in production. You will not see any Lua related stuff if you have not compiled OpenResty’s Lua modules into your Nginx build, of course.

Conclusion

This article is a first in a series of posts which explain how OpenResty and Nginx allocate and manage memory in the hope of optimize memory usage of those applications based on them. This post gives an overview of how memory is used and allocated on the highest level. In subsequent articles in the same series, we will have a closer look at each allocator and memory manage facility in great detail in each dedicated article. Stay tuned!

A Word on OpenResty XRay

OpenResty XRay is a commercial product offered by OpenResty Inc. We use this product in our articles like this one to intuitively demonstrate implementation details, as well as statistics about real world applications and open source software. In general, OpenResty XRay can help users to get deep insight into their online and offline software systems without any modifications or any other collaborations, and efficiently troubleshoot really hard problems for performance, reliability, and security. It utilizes advanced dynamic tracing technologies developed by OpenResty Inc. and others.

You are welcome to contact us to join our Beta Test Community for this product. It’s free!

OpenResty XRay Console Dashboard

About The Author

Yichun Zhang is the creator of the OpenResty® open source project. He is also the founder and CEO of the OpenResty Inc. company. He contributed a dozen open source Nginx 3rd-party modules, quite some Nginx and LuaJIT core patches, and designed the OpenResty XRay platform.

Translations

We provide a Chinese translation for this article on blog.openresty.com ourselves. We also welcome interested readers to contribute translations in other natural languages as long as the full article is translated without any omissions. We thank them in advance.


  1. Morden Android operating systems do support swapping memory pages out to memory, but with those pages compressed, still saving physical memory space.

  2. This build option is -DLUAJIT_USE_SYSMALLOC. But never use it in production!

  3. By default the LuaJIT runtime only uses a single number representation for both integers and numbers, which is the double-precision floating-point numbers. But still the user can pass the build option -DLUAJIT_NUMMODE=2 to enable the additional 32-bit integer representation at the same time.

  4. But it is still our responsibility to make sure all references to those useless objects are properly removed.