Check out how OpenResty XRay helps organizations troubleshoot issues and optimize the performance of their applications.

Learn More LIVE DEMO

Regular Expressions are ubiquitous in web applications, including Lua applications running atop OpenResty or Nginx (with OpenResty’s core components like the Lua Nginx module). OpenResty provides the ngx.re.* Lua API functions such as ngx.re.match and ngx.re.find to expose the PCRE regular expression library’s C API to the Lua land. PCRE provides a very efficient implementation of the Perl-compatible regular expressions with a Just-in-Time (JIT) compiler. PCRE is widely used by various famous systems software, including the Nginx core and the PHP runtime.

However, many web developers pay little attention to the efficiency of their regular expressions. Pathological behavior may occur for poorly written regular expressions, especially in backtracking implementations like PCRE. Worse, such pathological behavior may only get triggered by relatively large data input with particular characteristics. Some DoS attackers may even specifically target such vulnerable regular expression patterns.

Fortunately, our OpenResty XRay product provides dynamic-tracing tools that can analyze online OpenResty and Nginx processes in real-time and quickly locate those vulnerable and slow regular expressions. No special plugins or modules are needed in the target OpenResty or Nginx applications, and absolutely no overhead would incur when OpenResty XRay is not sampling.

In this tutorial, we will demonstrate such advanced analyzers with real-world examples. In the end, we’ll also look at alternatives with fewer pathological behaviors, including RE2 and the OpenResty Regex engine.

System Environment

Here we use a RedHat Enterprise Linux 7 system as an example. Any Linux distributions supported by OpenResty XRay should work equally fine, like Ubuntu, Debian, Fedora, Rocky, Alpine, etc.

We use an open-source OpenResty binary build as the target application. You can use any OpenResty or Nginx binaries, including those compiled by yourself. No special build options, plugins, or libraries are needed in your existing server installation or processes. It is the beauty of dynamic tracing technologies. It’s genuinely non-invasive.

We also have the OpenResty XRay’s Agent daemon running on the same system and have the command-line utilities from the openresty-xray-cli package installed and configured.

A Sample OpenResty/Nginx Lua Application

We have an inefficient OpenResty/Nginx Lua application that tops at 100% CPU usage in a server, as evidenced by the top command-line utility:

$ top -p 3584441
...
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3584441 nobody    20   0   41284   7184   4936 R 100.0   0.0   1:43.97 nginx

Note that the Nginx process’s CPU usage is 100.0%, and its PID is 3584441. We’ll reference this PID in our later dynamic-tracing analysis.

Narrowing Down the Culprit without Guessing

C-Land CPU Flame Graphs

To analyze any C/C++ processes (including Nginx processes) having high CPU usage, we should first obtain a C-land CPU flame graph1 from the process. Although OpenResty XRay can automatically detect OpenResty/Nginx applications and sample them and generate various kinds of flame graphs, here, we will use the manual way of running command-line utilities solely for demonstration purposes.

$ orxray analyzer run lj-c-on-cpu -p 3584441
Start tracing...
Go to https://x5vrki.xray.openresty.com/targets/68/history/664712 for charts.

We use the -p option to specify the target Nginx worker process’s PID. And we run the standard OpenResty XRay tool named lj-c-on-cpu. We don’t use the more generic c-on-cpu tool here because the former can automatically unwind dynamic machine code frames from LuaJIT and PCRE’s Just-in-Time (JIT) compilers.

From this C-land CPU flame graph, we can see that

  1. almost all the CPU time is for executing Lua code (note the highlighted ngx_http_lua_run_thread C function frame in the flame graph).
  2. And the Lua code is also executing PCRE matcher (note the pcre_exec function frame in the graph).
  3. Furhtermore, the PCRE JIT’d code is being executed, according to the _pcre_jit_exec function frame in the graph).

Note the power of this dynamic-tracing tool of transparently unwinding JIT’d code frames without any help from the target applications2.

The next step is sampling and generating a Lua-land CPU flame graph so that we can see what is going on in the Lua code.

Lua-Land CPU Flame Graphs

To generate a Lua-land CPU flame graph, we can use the standard analyzer, lj-lua-on-cpu, as in

$ orxray analyzer run lj-lua-on-cpu -p 3584441
Start tracing...
Go to https://x5vrki.xray.openresty.com/targets/68/history/664870 for charts.

The Lua-land flame graph also shows JIT’d Lua code, as evidenced by the trace#3:regex.lua:721 frame in the graph. That means a LuaJIT trace object of the trace ID 3, starting from the Lua code line at line 721 of the source file regex.lua. We can see that almost all the CPU time is on a single Lua code path. The most interesting Lua function frame in this dominating Lua code path is regex.lua:re_match. In that Lua file, we have

local re_match = ngx.re.match

So the re_match symbol name points to the ngx.re.match API function. Then the next question becomes: what regular expression is so slow? We could directly look up the Lua source line on content_by_lua(nginx.conf:69):10, as shown in the flame graph. But if the Lua code is generic and simply matching through a Lua table holding multiple regular expressions, it is not sufficient to pinpoint the regular expression in question.

The Slowest Regular Expressions

Knowing that it is the regular expression matching taking most of the CPU time, we can now use another standard analyzer named lj-slowest-ngx-re to find the slowest regular expressions matched in the current live process, as in

$ orxray analyzer run lj-slowest-str-match-find -p 3584441 --y-var threshold_ns=1 --y-var max_hits=1000
Start tracing...
sub_regex_cache_count: 0
http: match
cache: ptr: 0x154bbc0, type: match, pat: "\w"
cache: ptr: 0x1517890, type: match, pat: "(?:.*)*css"
cache: ptr: 0x154a3a0, type: match, pat: ".*?js"
http: substitution
http: function substitution
...WARNING: reach samples count: 1000
=== start ===
max latency: 15844343 ns, ptr: 0x1517890
max latency: 13555 ns, ptr: 0x154a3a0
max latency: 8075 ns, ptr: 0x154bbc0
=== finish ===
Go to https://x5vrki.xray.openresty.com/targets/68/history/799249 for charts.

We open the URL in a web browser to browse the resulting bar chart shown below.

This bar chart shows the maximum execution latency for the slowest regular expressions run inside the sampling time window. The latency is in nanoseconds. Obviously, (?:.*)*css is the absolute slowest regular expression, taking more than 20.6 milliseconds (or 20,671,282 nanoseconds). It is also way slower than the other ones. Knowledgeable users can quickly identify the use of a greedy quantifier, .*, with a constant suffix can result in aggressive backtracking in a regular expression engine. We can optimize the regular expression using a non-greedy version of the quantifier instead: (?:.*?)css.

After Optimizing the Slowest Regex

After optimizing the slowest regular expression (?:.*)*css to (?:.*?)css, we reload the application with the optimized regular expression and re-generate a new C-land CPU flame graph.

We can see the shapes and characteristics of the new flame graph are very different from the previous one. The bottleneck is no longer the code paths executing Lua code. Instead, it is now the writev system call responsible for writing out the HTTP response data (as highlighted by red in the graph above). Note the ngx_http_output_filter C function frame in the graph. This function is responsible for invoking the Nginx output filter chain to send out the HTTP response header and body data.

Let’s rerun the lj-slowest-ngx-re tool against the optimized application. The resulting chart is as follows.

We can see that the maximum latency of the regular expression is much shorter, merely 58,660 nanoseconds, or about 59 microseconds.

Limiting the execution overhead of PCRE

OpenResty’s Lua Nginx module provides the lua_regex_match_limit directive to set a hard limit on the basic operations performed for a single regular expression matching.

For example, we can put the following line into the nginx.conf file.

lua_regex_match_limit 100000;

And make sure we log any error returned by the ngx.re.match API function call:

local m, err = re_match(content, regexes[i], "jo")
if err ~= nil then
    ngx.log(ngx.ERR, "re_math failed, re: ", regexes[i], ", error: ", err)
    return
end

When the match limit is exceeded, we will get the following error log message:

2022/07/13 21:05:13 [error] 3617128#3617128: *1 [lua] content_by_lua(nginx.conf:69):11: re_math failed, re: (.*)*css, error: pcre_exec() failed: -8, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", host: "127.0.0.1"

Non-Backtracking Regular Expression Engines

PCRE has almost the full support for Perl regular expressions. The flexibility comes with a cost involved with backtracking. The user may also want to consider other non-backtracking implementations of regular expressions like Google’s RE2 or OpenResty Inc’s commercial OpenResty Regex engine. These engines usually employ algorithms based on the automaton theory. Unfortunately, the average matching time of RE2 is about 50% slower than PCRE for typical regular expressions though it has few pathological cases for slow matches. The OpenResty Regex engine, on the other hand, achieves similar average performance as PCRE (sometimes even faster) and is also free of backtracking issues.

Lua’s Built-in String Patterns

The standard Lua 5.1 language defines its own regular expression language syntax, as supported by its standard API functions string.match and string.find. OpenResty XRay also provides dynamic-tracing tools for analyzing such pattern-matching operations. We will cover this topic in another tutorial.

Tracing Applications inside Containers

OpenResty XRay tools support tracing containerized applications transparently. Both Docker and Kubernetes (K8s) containers work transparently. Just as with normal application processes, the target containers do not need any applications or extra privileges. The OpenResty XRay Agent daemon should run outside the target containers (like in the host operating system directly or in its own privileged container).

Let’s see an example. We first check the container name or container ID with the docker ps command.

$ docker ps
CONTAINER ID   IMAGE                                       COMMAND                  CREATED         STATUS          PORTS     NAMES
4465297209d9   openresty/openresty:1.19.3.1-2-alpine-fat   "/usr/local/openrest…"   18 months ago   Up 45 minutes             angry_mclaren

Here the container name is angry_mclaren. We can then find out the target process’s PID in this container.

$ docker top angry_mclaren
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                3605106             3605085             0                   19:40               ?                   00:00:00            nginx: master process /usr/local/openresty/bin/openresty -g daemon off;
nobody              3605692             3605106             0                   19:44               ?                   00:00:20            nginx: worker process

The PID for the openresty worker process is 3605692. We then run the OpenResty XRay analyzer against this PID as usual.

$ orxray analyzer run lj-slowest-ngx-re -p 3605692
Start tracing...
...
Go to https://x5vrki.xray.openresty.com/targets/68/history/684552 for charts.

OpenResty XRay is also able to automatically detect long-running processes as “applications” of a particular type (like “OpenResty”, “Python”, etc.).

How The Tools are Implemented

All the tools are implemented in the Y language. OpenResty XRay executes them with either the Stap+2 or eBPF3 backends of OpenResty XRay, both of which use the 100% non-invasive dynamic tracing technologies based on the Linux kernel’s uprobes and kprobes facilities.

We don’t require any collaborations from the target applications and processes. No log data or metrics data is used or needed. We directly analyze the running processes' process space in a strictly read-only way. And we also never inject any byte-code or other executable code into the target processes. It is 100% clean and safe.

The Overhead of the Tools

The dynamic-tracing tool lj-slowest-ngx-re demonstrated in this tutorial is very efficient and suitable for online execution.

When the tools are not running and actively sampling, the overhead on the system and the target processes are strictly zero. We never inject any extra code or plugins into the target applications and processes; thus, there’s no inherent overhead.

During sampling, the request latency only increases by about 7 microseconds (us) on average on typical server hardware. And the reduction in the maximum request throughput for the fastest OpenResty/Nginx server serving tens of thousands of requests per second on each CPU core is not measurable.

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.


  1. OpenResty XRay’s C-land CPU flame graphs are much more powerful than most of the open-source solutions. It supports unwinding JIT’d machine code without any collaborations from the target processes and it does support inlined C functions and detailed source file name and line numbers. ↩︎

  2. The open-source Linux perf tool chain has limited JIT’d code unwinding support that requires the target application to create a special perf.map file. That is special help and collaborations from the target application and it may lead to file data leaks and extra runtime overhead even if the user does not need to sample and profile. ↩︎

  3. Stap+ is OpenResty Inc’s greatly enhanced version of SystemTap↩︎