How OpenResty XRay Thoroughly Analyzes Nginx Memory Corruption Issues
Frequent crashes of Nginx worker processes are one of the thorny problems faced by operations and development personnel, especially when crashes are caused by memory corruption, where the root cause is often obscured by layers of complexity and difficult to trace. This article will provide a detailed review of a real-world case where Nginx memory corruption led to crashes in a production environment, demonstrating how OpenResty XRay can meticulously unravel the issue and precisely locate deep-level bugs caused by secondary development violating Nginx’s connection pool management mechanism, providing an effective diagnostic approach for solving similar problems.
A Nginx Crash Case That Kept the Operations Team Up All Night
The customer’s online service experienced frequent crashes of Nginx worker processes, severely affecting service stability. These types of memory error-induced crashes are typically difficult to locate. From the surface-level crash stack, the error location is often not the root of the problem, as the “first scene” of the issue has already been corrupted. Traditional log analysis or code inspection methods are powerless in such scenarios. To quickly and precisely locate the problem, we utilized OpenResty XRay’s on-site recording capability to perform online recording of the Nginx worker processes. By replaying the recorded files to analyze the crash process, we were ultimately able to reconstruct the full picture of the problem.
How to Use OpenResty XRay to Precisely Capture the “First Scene” of Memory Corruption
OpenResty XRay experts analyzed the recorded files to identify the root cause for the customer. The following text demonstrates the core analysis process.
Preliminary Analysis: Locating the Crash Point
We first let the program run until the crash was reproduced, then used GDB to examine the stack trace (Backtrace) at the time of the crash.
> c
> bt
#0 0x00000000004ccd1a in ngx_http_run_posted_requests (c=0x7f5fc3a010f8) at src/http/ngx_http_request.c:3152
#1 0x00000000004d047b in ngx_http_process_request_headers (rev=rev@entry=0x7f5fbfc010e0) at src/http/ngx_http_request.c:2021
#2 0x00000000004d10fd in ngx_http_process_request_line (rev=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1473
#3 0x00000000004b2d89 in ngx_epoll_process_events (cycle=0x7f60a3888b90, timer=<optimized out>, flags=<optimized out>)
...
#9 0x0000000000482c0e in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:436
The crash occurred at line 3152 in the ngx_http_run_posted_requests function. The corresponding code is if (pr == NULL), where the value of pr comes from the previous line pr = r->main->posted_requests;. This indicates that a segmentation fault occurred when accessing r->main.
3114 void
3115 ngx_http_run_posted_requests(ngx_connection_t *c)
3116 {
3117 ngx_http_request_t *r;
3118 ngx_http_posted_request_t *pr;
3119 for ( ;; ) {
3120 if (c->destroyed) {
3121 return;
3122 }
3123 r = c->data;
3124 pr = r->main->posted_requests;
3125 if (pr == NULL) {
3126 return;
3127 }
3128 r->main->posted_requests = pr->next;
3129 r = pr->request;
3130 ngx_http_set_log_request(c->log, r);
3131 ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0,
3132 "http posted request: \"%V?%V\"", &r->uri, &r->args);
3133 r->write_event_handler(r);
3134 }
3135 }
When we attempted to print r->main, we found it was an invalid address.
end 1,471,308> print r->main
$14 = (ngx_http_request_t *) 0xffffffff2e616e69
end 1,471,308> print r->main->posted_requests
Cannot access memory at address 0xffffffff2e6173b1
This confirms that the r pointer or its member main has been corrupted. In the code, the value of r comes from r = c->data, where c is a connection object of type ngx_connection_t. We suspect that c or c->data was incorrectly modified at some point.
Deep Tracing: Using Hardware Watchpoints
To identify who modified c->data and when, we set up hardware watchpoints on this memory address.
end 1,471,308> print &c->data
$8 = (void **) 0x7f5fc3a010f8
end 1,471,308> watch *(void **)0x7f5fc3a010f8
Hardware watchpoint 7: *(void **)0x7f5fc3a010f8
After setting up hardware watchpoints, we execute the rc (reverse-continue) command to make the process run in reverse, allowing us to observe the changes to c->data.
end 1,471,308> rc
As we continue running the program, the watchpoint triggers multiple times, revealing a key sequence of operations:
First trigger:
c->datais modified. In thengx_http_lua_socket_resolve_retval_handlerfunction,c->datais assigned a new value. Since this value is angx_http_lua_socket_tcp_upstream_tobject rather than angx_http_request_tobject, treatingc->dataas angx_http_request_tcauses access to incorrect memory addresses. Therefore, we continue executing therccommand to examine the change process ofc->data.99% 1,471,080> bt #0 ngx_http_lua_socket_resolve_retval_handler (r=0x7f60a3838050, u=0x40c1ab88, L=0x40c1ab00) at addons/lua-nginx-module/src/ngx_http_lua_socket_tcp.c:1176 #1 0x000000000058b3b9 in ngx_http_lua_socket_tcp_connect (L=0x40c1ab00) at addons/lua-nginx-module/src/ngx_http_lua_socket_tcp.c:744 #2 0x0000000000647817 in lj_BC_FUNCC () #3 0x0000000000573c95 in ngx_http_lua_run_thread (L=L@entry=0x41afc378, r=r@entry=0x7f60a3838050, ctx=ctx@entry=0x7f60a3917e68, nrets=nrets@entry=0) at addons/lua-nginx-module/src/ngx_http_lua_util.c:1073 #4 0x0000000000578acc in ngx_http_lua_rewrite_by_chunk (L=0x41afc378, r=0x7f60a3838050) at addons/lua-nginx-module/src/ngx_http_lua_rewriteby.c:338 #5 0x0000000000578ddd in ngx_http_lua_rewrite_handler (r=0x7f60a3838050) at addons/lua-nginx-module/src/ngx_http_lua_rewriteby.c:167 #6 0x00000000004c7763 in ngx_http_core_rewrite_phase (r=0x7f60a3838050, ph=0x7f60a2f50768) at src/http/ngx_http_core_module.c:1123 #7 0x00000000004c2305 in ngx_http_core_run_phases (r=r@entry=0x7f60a3838050) at src/http/ngx_http_core_module.c:1069 #8 0x00000000004c23ea in ngx_http_handler (r=r@entry=0x7f60a3838050) at src/http/ngx_http_core_module.c:1052 #9 0x00000000004cfca1 in ngx_http_process_request (r=r@entry=0x7f60a3841050) at src/http/ngx_http_request.c:2780 #10 0x00000000004d06f7 in ngx_http_process_request_headers (rev=rev@entry=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1995 #11 0x00000000004d10fd in ngx_http_process_request_line (rev=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1473 #12 0x00000000004b2d89 in ngx_epoll_process_events (cycle=0x7f60a3888b90, timer=<optimized out>, flags=<optimized out>)Second Trigger: Memory Reuse.
The program triggered the observation point again in the call chain ngx_event_connect_peer -> ngx_get_connection. In ngx_get_connection, a new connection object is obtained from Nginx’s connection pool. During the initialization of the new connection, ngx_memzero clears this memory block (including the destroyed flag) to zero. Continue executing the rc command to observe the changes in c->data.
```gdb
#0 0x0000000000494e1c in memset (__len=360, __ch=0, __dest=0x7f5fc3a010f8) at /usr/include/bits/string3.h:84
#1 ngx_get_connection (s=s@entry=14, log=0x7f60a31c12c0) at src/core/ngx_connection.c:1190
#2 0x00000000004a8e26 in ngx_event_connect_peer (pc=pc@entry=0x40c1abc8) at src/event/ngx_event_connect.c:53
#3 0x0000000000587cb3 in ngx_http_lua_socket_resolve_retval_handler (r=0x7f60a3838050, u=0x40c1ab88, L=0x40c1ab00)
at addons/lua-nginx-module/src/ngx_http_lua_socket_tcp.c:1131
#4 0x000000000058b3b9 in ngx_http_lua_socket_tcp_connect (L=0x40c1ab00) at addons/lua-nginx-module/src/ngx_http_lua_socket_tcp.c:744
#5 0x0000000000647817 in lj_BC_FUNCC ()
#6 0x0000000000573c95 in ngx_http_lua_run_thread (L=L@entry=0x41afc378, r=r@entry=0x7f60a3838050, ctx=ctx@entry=0x7f60a3917e68,
nrets=nrets@entry=0) at addons/lua-nginx-module/src/ngx_http_lua_util.c:1073
#7 0x0000000000578acc in ngx_http_lua_rewrite_by_chunk (L=0x41afc378, r=0x7f60a3838050)
at addons/lua-nginx-module/src/ngx_http_lua_rewriteby.c:338
#8 0x0000000000578ddd in ngx_http_lua_rewrite_handler (r=0x7f60a3838050) at addons/lua-nginx-module/src/ngx_http_lua_rewriteby.c:167
#9 0x00000000004c7763 in ngx_http_core_rewrite_phase (r=0x7f60a3838050, ph=0x7f60a2f50768) at src/http/ngx_http_core_module.c:1123
#10 0x00000000004c2305 in ngx_http_core_run_phases (r=r@entry=0x7f60a3838050) at src/http/ngx_http_core_module.c:1069
#11 0x00000000004c23ea in ngx_http_handler (r=r@entry=0x7f60a3838050) at src/http/ngx_http_core_module.c:1052
#12 0x00000000004cfca1 in ngx_http_process_request (r=r@entry=0x7f60a3841050) at src/http/ngx_http_request.c:2780
#13 0x00000000004d06f7 in ngx_http_process_request_headers (rev=rev@entry=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1995
#14 0x00000000004d10fd in ngx_http_process_request_line (rev=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1473
#15 0x00000000004b2d89 in ngx_epoll_process_events (cycle=0x7f60a3888b90, timer=<optimized out>, flags=<optimized out>)
```
Third trigger: Connection is released. The program triggered the observation point in the call chain
ngx_http_close_connection->ngx_free_connection. This indicates that the connection pointed to bycwas normally closed and released. Before closing, thec->destroyedflag was set to 1.#0 0x000000000049511b in ngx_free_connection (c=c@entry=0x7f5fc3a010f8) at src/core/ngx_connection.c:1220 #1 0x0000000000495551 in ngx_close_connection (c=c@entry=0x7f5fc3a010f8) at src/core/ngx_connection.c:1292 #2 0x00000000004cbf0d in ngx_http_close_connection (c=0x7f5fc3a010f8) at src/http/ngx_http_request.c:4575 #3 0x00000000004c8be8 in ngx_http_core_content_phase (r=0x7f60a3841050, ph=0x7f60a2f509c0) at src/http/ngx_http_core_module.c:1388 #4 0x00000000004c2305 in ngx_http_core_run_phases (r=r@entry=0x7f60a3841050) at src/http/ngx_http_core_module.c:1069 #5 0x00000000004c23ea in ngx_http_handler (r=r@entry=0x7f60a3841050) at src/http/ngx_http_core_module.c:1052 #6 0x00000000004cfc69 in ngx_http_process_request (r=r@entry=0x7f60a3841050) at src/http/ngx_http_request.c:2764 #7 0x00000000004d06f7 in ngx_http_process_request_headers (rev=rev@entry=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1995 #8 0x00000000004d10fd in ngx_http_process_request_line (rev=0x7f5fbfc010e0) at src/http/ngx_http_request.c:1473 #9 0x00000000004b2d89 in ngx_epoll_process_events (cycle=0x7f60a3888b90, timer=<optimized out>, flags=<optimized out>)
At this point, the entire chain of events has become clear: in the processing flow of the ngx_http_process_request_headers function, the current connection c is first closed, and its destroyed flag is set to 1. However, due to the customer’s secondary development modifications to the Nginx framework, subsequent code immediately requested a new connection, and Nginx’s memory pool happened to allocate the recently released memory block. This caused the memory pointed to by the original c pointer to be reinitialized, with the destroyed flag reset to 0.
When the logic of ngx_http_process_request_headers continues to progress and finally calls ngx_http_run_posted_requests(c), the if (c->destroyed) check fails because the flag has been cleared to zero. The function continues to execute, attempting to access c->data (i.e., r), but at this point, c->data already belongs to a new connection, with content and context completely mismatched, ultimately leading to illegal memory access and a crash.
How Secondary Development Accidentally Broke Nginx’s Lifecycle Management
The root cause of the problem lies in the customer’s secondary development code breaking Nginx’s originally rigorous connection lifecycle management mechanism. Specifically, within the processing domain of the ngx_http_process_request_headers function, a conflict occurred where the same connection object was “released and immediately reused.”
The call chain is as follows:
ngx_http_process_request_headers->ngx_http_process_request-> … ->ngx_http_close_connection: The connection is closed,c->destroyedis set to 1.ngx_http_process_request_headers->ngx_http_process_request-> … ->ngx_get_connection: Due to secondary development logic, the code requests a new connection here, which happens to reuse the recently released memory, causingc->destroyedto be reset to zero.ngx_http_process_request_headers->ngx_http_run_posted_requests(c): The check forc->destroyedfails, using the contaminatedc->data, causing the program to crash.
Why OpenResty XRay Is Your Best Choice?
In the rapidly changing online business environment, every service crash potentially means user loss and revenue decline. The memory stomping issue in this case, caused by improper secondary development, is truly a “hidden killer” lurking deep within the system. It operates stealthily with destructive power, capable of making even the most excellent development teams spend weeks investigating, yet still remain helpless.
Traditional methods like log analysis and code review often prove ineffective when facing complex problems where the “crime scene” has already been compromised. This is precisely where OpenResty XRay demonstrates its value—we provide not just a tool, but an intelligent diagnostic system oriented toward the future.
Say Goodbye to Needle-in-Haystack Searches, Go Straight to the Root Cause Traditional troubleshooting methods are like searching for a needle in a haystack—time-consuming, labor-intensive, and with uncertain results. OpenResty XRay leverages its powerful automated root cause analysis capabilities to penetrate through layers of confusion, precisely locking onto the instantaneous memory operations that cause crashes, clearly presenting the complete logical chain from “connection contamination” to “program crash.” It reduces your team’s weeks of work to just minutes of precise analysis.
From “Firefighting Team” to “Architect,” Unleash Your Team’s Potential Free your valuable development resources from endless online “firefighting.” OpenResty XRay can quickly locate and resolve these deeply hidden bugs, allowing your team to focus more on business innovation and architectural optimization, investing energy in work that creates greater value, rather than passively responding to failures.
Safeguard Core Business, Avoid Major Losses Every stability challenge is a test of your business’s core competitiveness. OpenResty XRay provides exactly this kind of enterprise-level stability assurance. Through deep, professional insights, we help you identify and eliminate potential risks in advance, ensuring your core business runs robustly 24/7, avoiding immeasurable business impact and brand reputation damage caused by underlying technical issues.
Choosing OpenResty XRay means choosing a professional commitment to core business stability. We are dedicated to becoming your most reliable technical partner, using cutting-edge dynamic tracing technology to protect every critical moment of your business.
What is OpenResty XRay
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.
















