How OpenResty XRay Enables Full-Stack Dynamic Tracing in Production
The challenge of troubleshooting modern software systems is fundamentally a visibility problem. As systems grow deeper in complexity—spanning business logic, language runtimes, system-level middleware, the OS kernel, and the network stack—engineers’ ability to observe and understand what is actually happening in production continues to erode.
Traditional approaches face two fundamental tensions: static instrumentation requires engineers to anticipate what data they will need before a problem even occurs, yet many critical issues only manifest under real production traffic at specific concurrency levels, with occurrence rates as low as one in a thousand. Offline debugging takes the opposite approach—it cuts off the live traffic needed to reproduce the problem, and GDB’s single-step execution alters program timing, causing timing-sensitive bugs and race conditions to vanish during the debugging session itself.
Neither path scales to production. This reality drove the evolution of dynamic tracing tools. Yet every existing framework carries its own design constraints, and it is precisely these constraints that define the next frontier that dynamic tracing technology must break through.
Where Do Existing Dynamic Tracing Frameworks Fall Short in Production?
The following analysis is not a judgment of each framework’s engineering quality. Rather, it examines the inherent technical boundaries imposed by their design trade-offs, evaluated through the demanding lens of production requirements.
SystemTap: SystemTap works by JIT-compiling .stp scripts into C code and building them into a temporary Linux kernel module on the fly. This process introduces two critical constraints in production:
- Environment dependencies: The target host must have a complete C compiler toolchain (e.g., GCC) installed, along with
kernel-develheaders that exactly match the running kernel version. This adds deployment complexity and ongoing environment management overhead. - Startup latency: Every execution involves compilation, linking, and kernel module loading—a process that can take seconds to minutes. For incident response scenarios where rapid diagnosis is critical, this latency is a significant barrier.
DTrace: The D language was intentionally designed without loop constructs—a deliberate choice to guarantee that probe code running in kernel space always terminates in a finite number of steps, which is the cornerstone of its safety model. However, this design comes with trade-offs:
- Limited expressiveness: When traversal of unbounded data structures (such as linked lists or hash tables) is required, the lack of loops makes it difficult to implement complex iteration and aggregation logic.
- Manual symbol management for user-space tracing: DTrace requires manually loading and managing debug symbols for user-space process symbol resolution, with no automated mechanism—adding operational complexity and increasing the risk of error.
eBPF: To guarantee kernel security and stability, all eBPF programs must pass rigorous verification by the kernel’s built-in verifier. This safety model comes with inherent technical constraints:
- Resource limits: Programs must operate within limited stack space (e.g., 512 bytes), and there is a ceiling on total instruction count—though newer kernels have significantly raised this limit to 1 million instructions. These constraints cap the logical complexity achievable within a single eBPF program.
- Bounded loops: The verifier must statically prove that all loops terminate within a finite number of iterations, making direct traversal of data structures with unknown lengths extremely challenging. These constraints are fundamental to eBPF’s security guarantees, but they also mean that complex analysis logic often needs to be decomposed or offloaded to user space.
GDB / LLDB: These debuggers rely on the ptrace system call to control and inspect target processes—an inherently invasive mechanism:
- Stop-the-world overhead: To read memory or register state,
ptracemust halt the target process withSIGSTOP. For high-concurrency production services, this global pause can cause latency spikes, request timeouts, or full outages—unacceptable in any production context. - Timing perturbation: Single-step execution and breakpoint debugging fundamentally alter a program’s execution timing, making concurrency issues such as race conditions and deadlocks extremely difficult to reproduce and diagnose. Furthermore, internal implementation details (e.g., GDB’s use of
longjmpfor error handling) are not designed for high-throughput, low-latency production analysis.
perf: perf is architected around the CPU’s Performance Monitoring Unit (PMU), optimized for efficiently collecting hardware performance counters (PMCs) and performing CPU-level statistical sampling.
- CPU-centric perspective:
perfexcels at answering “what is the CPU busy with?"—cache hit rates, branch misprediction rates, and similar hardware-level metrics. However, it offers limited capability when it comes to capturing high-level software context. - No application-layer semantics: perf cannot easily inspect or interpret data structure state within application code (e.g., the value of a specific variable inside a function). Correlating low-level hardware events with high-level application logic requires substantial additional manual analysis.
How Did OpenResty XRay Rethink Dynamic Tracing at the Architecture Level?
OpenResty XRay’s response to these constraints is not a series of point fixes—it is a ground-up, architectural redesign of the tracing infrastructure itself.
Why Did We Build a Custom uretprobes Implementation to Avoid Kernel Stack Corruption?
The Linux kernel’s native uretprobes mechanism captures function return events by directly modifying the target process’s runtime stack. This corrupts stack unwinding—the mechanism that underlies call chain reconstruction in flame graphs and core dump analysis—making analysis results unreliable.
OpenResty XRay implements its own uretprobe-equivalent mechanism that captures function returns without modifying the target process’s runtime stack, completely eliminating this fundamental flaw in the kernel’s native implementation.
YWhat Is the Y Language and How Does It Create a Unified Debugging Layer?
Today’s dynamic tracing ecosystem is deeply fragmented. DTrace has its own D language, SystemTap has its own scripting syntax, GDB and LLDB offer mutually incompatible Python APIs, and eBPF programs must be written in a restricted subset of C and pass through the kernel verifier.
This heterogeneity creates two compounding problems:
- Steep learning curve: Engineers must master a distinct DSL and programming model for each target environment.
- Zero code portability: Tracing scripts written for one platform are nearly impossible to migrate to another, forcing teams to rewrite diagnostic logic from scratch for every new environment.
OpenResty XRay addresses this with Y Language—a high-level abstraction layer that parses tracing logic and transpiles it to the native formats of multiple backend frameworks, including:
- DTrace D language scripts
- SystemTap scripts
- eBPF bytecode (loaded via toolchains such as BCC)
- GDB and LLDB Python extension scripts
With this approach, engineers write and maintain a single set of high-level tracing logic. This unifies the development experience and enables true “write once, run anywhere” for tracing code—significantly lowering the barrier to dynamic analysis in complex, heterogeneous environments and reducing long-term maintenance costs.
Live Analysis vs. Post-Mortem Analysis: How Does OpenResty XRay Handle Both?
Production troubleshooting typically falls into two scenarios: the process is still running (live analysis) or it has crashed and left a core dump (post-mortem analysis). Traditionally, these two situations demand entirely different toolchains—switching between them means context-switching between different mental models and scripting languages.
By supporting backends such as GDB, the Y Language compiler brings both scenarios under a single programming model: the same tracing logic can be used for non-intrusive real-time sampling of live processes and for automated deep parsing of complex heap data structures in core dumps. One language, two analysis paths.
What Is an Automated Debug Symbol Library and How Does It Work?
The effectiveness of any advanced dynamic tracing tool ultimately depends on its ability to resolve raw memory addresses into meaningful code-level context. This symbolication process requires accurate debug symbol information—but in typical production deployments, binaries and their corresponding debug symbols are stripped and distributed separately, for reasons of size and security.
This creates a pervasive operational bottleneck: when an incident occurs, engineers must pause their analysis and manually sift through vast package repositories to locate, match, download, and deploy the exact symbol files for the specific software version under investigation. This is a brittle, time-consuming workflow that resists automation—and it is one of the core barriers to broad enterprise adoption of dynamic tracing.
OpenResty XRay treats this as an infrastructure problem and solves it at scale. We have designed and built a global, distributed symbol collection and indexing system that continuously ingests binaries and debug information from upstream open-source projects, official Linux distribution repositories, and other sources. The result is a structured symbol knowledge base at the tens-of-terabytes scale.
When an engineer initiates tracing on any supported target process, OpenResty XRay automatically and transparently resolves symbols with zero manual intervention—turning deep, on-demand analysis of production systems into a low-friction, always-available capability, and significantly reducing Mean Time to Resolution (MTTR).
What Is a Full-Stack Flame Graph and How Do You Use It to Diagnose Performance Bottlenecks?
Flame graphs are among the most intuitive outputs in dynamic tracing, but the depth of “full-stack” coverage varies significantly across tools. OpenResty XRay supports multiple sampling dimensions simultaneously and provides vertical visibility spanning the complete call chain—from the application layer all the way down to the kernel.
On-CPU flame graphs identify code paths where the program is actively consuming CPU time. They are ideal for diagnosing CPU spikes and abnormal throughput drops. Common findings include repeated regex compilation that bypasses the cache and accidentally retained debug code paths.
Off-CPU flame graphs capture wait time when a process is not scheduled on the CPU, surfacing root causes such as lock contention (e.g.,
sem_wait), file I/O blocking, network waits, or scheduler preemption. Many “slow requests, low CPU” problems trace their root cause directly to the off-CPU layer.Memory allocation flame graphs and object reference flame graphs profile memory behavior: the former pinpoints high-frequency allocation hot paths; the latter visualizes object retention chains, making it possible to track down slow leaks within custom memory pools—leaks that bypass the system allocator and therefore evade tools like Valgrind.
File I/O flame graphs bring disk access patterns into the sampling scope, providing direct evidence when diagnosing latency caused by disk configuration or I/O behavior.
Vertically, OpenResty XRay flame graphs span the full software stack: from Lua or Java business logic, through Nginx/Envoy and data-plane middleware, down to the OS kernel and network stack, all the way to blocking rooted in disk configuration. This cross-layer visibility is especially critical for diagnosing issues where symptoms surface at the application layer but the root cause lives in the system layer.
The entire sampling process requires no code changes and no service restarts, and the overhead imposed on the target application is negligible.
How Is OpenResty XRay Expanding Its Observability Coverage Over Time?
OpenResty XRay’s architecture is designed to extend naturally to new technology stacks. The platform has added non-intrusive analysis capabilities for Java applications and Envoy—two technology stacks that are widely deployed in production yet notoriously difficult to instrument with traditional tracing tools.
How Do You Profile Java Applications Without Any Code Instrumentation?
Achieving function-level observability in Java production environments has always been a difficult engineering challenge. Every mainstream approach carries a cost: adding manual logging requires redeployment; Java Agents intercept at class-loading time, introducing startup overhead and potential compatibility conflicts; APM frameworks offer pre-defined instrumentation, not on-demand sampling. The more fundamental problem is that all three approaches require deployment decisions to be made before the fault ever occurs.
OpenResty XRay’s Java function probes bypass this constraint entirely—no bytecode modification, no Java Agent, no JVM restart. Through out-of-process sampling, the platform observes the running JVM in real time and retrieves function-level data on demand.
Memory analysis works the same way. Traditional heap dumps have two well-known shortcomings: they trigger stop-the-world pauses during heap serialization, and offline snapshots have limited value for diagnosing slow leaks or transient spikes. OpenResty XRay requires no JVM pause and no Safepoint dependency. It covers common memory issues across three dimensions:
- GC object reference analysis: Reconstructs object retention paths to identify the root cause of unexpected memory residency
- GC object allocation count analysis: Ranks code paths by allocation frequency to pinpoint high-frequency sources of GC pressure
- GC object allocation size analysis: Identifies large object allocation sites to inform heap layout optimization
How Do You Diagnose Envoy Lua Performance Bottlenecks in Production?
Envoy handles significant traffic processing logic in modern service meshes. When performance issues emerge in its Lua extension layer, the diagnostic path is rarely obvious—Envoy’s own metrics expose only aggregated data, making it nearly impossible to trace the problem to specific Lua code paths.
OpenResty XRay provides real-time, Lua-level flame graph sampling for live Envoy instances, covering both CPU and off-CPU dimensions—no Envoy restart required, no configuration changes, deployable directly in core traffic environments. Combined with built-in intelligent diagnostics, the sampling results directly pinpoint the specific bottleneck code paths.
Further Reading and Real-World Case Studies
In enterprise IT, the true business value of a technology must be proven under real production conditions. The following case studies are drawn from actual production incidents at leading technology companies, demonstrating how OpenResty XRay leverages non-intrusive dynamic tracing to deliver full-stack root cause analysis—without interrupting services, without modifying code, and with negligible performance overhead—across a wide range of technology stacks and complex business scenarios.
- Live Production Debugging Without Downtime An Nginx Memory Leak That Couldn’t Be Fixed by a Restart: How We Traced It in Production
This case challenges the traditional ops instinct of “when in doubt, restart.” Under strict SLA constraints that prohibited both service interruption and code changes, it demonstrates how to capture and pinpoint a hidden infrastructure-level (Nginx) memory leak directly in production—with millisecond-level sampling overhead—without sacrificing business continuity.
- Unified Live and Post-Mortem Analysis From Crash to Root Cause: How OpenResty XRay Decoded an Nginx Memory Corruption Incident
A deep dive into core dump analysis and call chain reconstruction. Demonstrates how to transform post-incident guesswork into a systematic, closed-loop attribution workflow—from failure symptoms to source-code-level root causes.
- On-CPU Flame Graphs and Cross-Layer Penetration Dual Concurrent Bottlenecks? How OpenResty XRay’s Multi-Dimensional Analysis Cracked a Complex Performance Problem
Bridges the observability gap between the C/C++ system layer and the application layer. Using system-level CPU flame graphs for cross-stack penetration, this case is backed by hard data to support architectural performance tuning and compute cost reduction.
- Tracing Slow Leaks in Custom Memory Pools How to Use OpenResty XRay to Track a Memory Leak Caused by an LRU Cache
Traditional profilers are typically powerless against dynamic languages or custom memory allocators. This case demonstrates how high-resolution memory flame graphs bypass those limitations to precisely identify and eliminate a chronic memory leak deep in Lua business logic—specifically, an LRU cache implementation.
- Minute-Level Resolution of a Large-Scale Production Incident How OpenResty XRay Diagnosed and Resolved a Major Production Incident at Bilibili
A P0 incident affecting hundreds of millions of users put enterprise-grade observability tools to the ultimate test. With zero intervention—no downtime, no restarts—OpenResty XRay cut MTTR from hours to minutes, directly recovering business revenue and restoring user trust.
What is OpenResty XRay
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.


















