← Back

How OpenResty XRay Enables Full-Stack Dynamic Tracing in Production

Yichun Zhang Posted Mar 3, 2026 Updated Jun 18, 2026

14 mins read

Views

The challenge of troubleshooting modern software systems is fundamentally a visibility problem. As systems grow deeper in complexity—spanning business logic, language runtimes, system-level middleware, the OS kernel, and the network stack—engineers’ ability to observe and understand what is actually happening in production continues to erode.

Traditional approaches face two fundamental tensions: static instrumentation requires engineers to anticipate what data they will need before a problem even occurs, yet many critical issues only manifest under real production traffic at specific concurrency levels, with occurrence rates as low as one in a thousand. Offline debugging takes the opposite approach—it cuts off the live traffic needed to reproduce the problem, and GDB’s single-step execution alters program timing, causing timing-sensitive bugs and race conditions to vanish during the debugging session itself.

Neither path scales to production. This reality drove the evolution of dynamic tracing tools. Yet every existing framework carries its own design constraints, and it is precisely these constraints that define the next frontier that dynamic tracing technology must break through.

Try out OpenResty XRay for free today

Where Do Existing Dynamic Tracing Frameworks Fall Short in Production?

The following analysis is not a judgment of each framework’s engineering quality. Rather, it examines the inherent technical boundaries imposed by their design trade-offs, evaluated through the demanding lens of production requirements.

SystemTap: SystemTap works by JIT-compiling .stp scripts into C code and building them into a temporary Linux kernel module on the fly. This process introduces two critical constraints in production:

Environment dependencies: The target host must have a complete C compiler toolchain (e.g., GCC) installed, along with kernel-devel headers that exactly match the running kernel version. This adds deployment complexity and ongoing environment management overhead.
Startup latency: Every execution involves compilation, linking, and kernel module loading—a process that can take seconds to minutes. For incident response scenarios where rapid diagnosis is critical, this latency is a significant barrier.

DTrace: The D language was intentionally designed without loop constructs—a deliberate choice to guarantee that probe code running in kernel space always terminates in a finite number of steps, which is the cornerstone of its safety model. However, this design comes with trade-offs:

Limited expressiveness: When traversal of unbounded data structures (such as linked lists or hash tables) is required, the lack of loops makes it difficult to implement complex iteration and aggregation logic.
Manual symbol management for user-space tracing: DTrace requires manually loading and managing debug symbols for user-space process symbol resolution, with no automated mechanism—adding operational complexity and increasing the risk of error.

eBPF: To guarantee kernel security and stability, all eBPF programs must pass rigorous verification by the kernel’s built-in verifier. This safety model comes with inherent technical constraints:

Resource limits: Programs must operate within limited stack space (e.g., 512 bytes), and there is a ceiling on total instruction count—though newer kernels have significantly raised this limit to 1 million instructions. These constraints cap the logical complexity achievable within a single eBPF program.
Bounded loops: The verifier must statically prove that all loops terminate within a finite number of iterations, making direct traversal of data structures with unknown lengths extremely challenging. These constraints are fundamental to eBPF’s security guarantees, but they also mean that complex analysis logic often needs to be decomposed or offloaded to user space.

GDB / LLDB: These debuggers rely on the ptrace system call to control and inspect target processes—an inherently invasive mechanism:

Stop-the-world overhead: To read memory or register state, ptrace must halt the target process with SIGSTOP. For high-concurrency production services, this global pause can cause latency spikes, request timeouts, or full outages—unacceptable in any production context.
Timing perturbation: Single-step execution and breakpoint debugging fundamentally alter a program’s execution timing, making concurrency issues such as race conditions and deadlocks extremely difficult to reproduce and diagnose. Furthermore, internal implementation details (e.g., GDB’s use of longjmp for error handling) are not designed for high-throughput, low-latency production analysis.

perf: perf is architected around the CPU’s Performance Monitoring Unit (PMU), optimized for efficiently collecting hardware performance counters (PMCs) and performing CPU-level statistical sampling.

CPU-centric perspective: perf excels at answering “what is the CPU busy with?"—cache hit rates, branch misprediction rates, and similar hardware-level metrics. However, it offers limited capability when it comes to capturing high-level software context.
No application-layer semantics: perf cannot easily inspect or interpret data structure state within application code (e.g., the value of a specific variable inside a function). Correlating low-level hardware events with high-level application logic requires substantial additional manual analysis.

How Did OpenResty XRay Rethink Dynamic Tracing at the Architecture Level?

OpenResty XRay’s response to these constraints is not a series of point fixes—it is a ground-up, architectural redesign of the tracing infrastructure itself.

Why Did We Build a Custom uretprobes Implementation to Avoid Kernel Stack Corruption?

The Linux kernel’s native uretprobes mechanism captures function return events by directly modifying the target process’s runtime stack. This corrupts stack unwinding—the mechanism that underlies call chain reconstruction in flame graphs and core dump analysis—making analysis results unreliable.

OpenResty XRay implements its own uretprobe-equivalent mechanism that captures function returns without modifying the target process’s runtime stack, completely eliminating this fundamental flaw in the kernel’s native implementation.

YWhat Is the Y Language and How Does It Create a Unified Debugging Layer?

Today’s dynamic tracing ecosystem is deeply fragmented. DTrace has its own D language, SystemTap has its own scripting syntax, GDB and LLDB offer mutually incompatible Python APIs, and eBPF programs must be written in a restricted subset of C and pass through the kernel verifier.

This heterogeneity creates two compounding problems:

Steep learning curve: Engineers must master a distinct DSL and programming model for each target environment.
Zero code portability: Tracing scripts written for one platform are nearly impossible to migrate to another, forcing teams to rewrite diagnostic logic from scratch for every new environment.

OpenResty XRay addresses this with Y Language—a high-level abstraction layer that parses tracing logic and transpiles it to the native formats of multiple backend frameworks, including:

DTrace D language scripts
SystemTap scripts
eBPF bytecode (loaded via toolchains such as BCC)
GDB and LLDB Python extension scripts

With this approach, engineers write and maintain a single set of high-level tracing logic. This unifies the development experience and enables true “write once, run anywhere” for tracing code—significantly lowering the barrier to dynamic analysis in complex, heterogeneous environments and reducing long-term maintenance costs.

Live Analysis vs. Post-Mortem Analysis: How Does OpenResty XRay Handle Both?

Production troubleshooting typically falls into two scenarios: the process is still running (live analysis) or it has crashed and left a core dump (post-mortem analysis). Traditionally, these two situations demand entirely different toolchains—switching between them means context-switching between different mental models and scripting languages.

By supporting backends such as GDB, the Y Language compiler brings both scenarios under a single programming model: the same tracing logic can be used for non-intrusive real-time sampling of live processes and for automated deep parsing of complex heap data structures in core dumps. One language, two analysis paths.

What Is an Automated Debug Symbol Library and How Does It Work?

The effectiveness of any advanced dynamic tracing tool ultimately depends on its ability to resolve raw memory addresses into meaningful code-level context. This symbolication process requires accurate debug symbol information—but in typical production deployments, binaries and their corresponding debug symbols are stripped and distributed separately, for reasons of size and security.

This creates a pervasive operational bottleneck: when an incident occurs, engineers must pause their analysis and manually sift through vast package repositories to locate, match, download, and deploy the exact symbol files for the specific software version under investigation. This is a brittle, time-consuming workflow that resists automation—and it is one of the core barriers to broad enterprise adoption of dynamic tracing.

OpenResty XRay treats this as an infrastructure problem and solves it at scale. We have designed and built a global, distributed symbol collection and indexing system that continuously ingests binaries and debug information from upstream open-source projects, official Linux distribution repositories, and other sources. The result is a structured symbol knowledge base at the tens-of-terabytes scale.

When an engineer initiates tracing on any supported target process, OpenResty XRay automatically and transparently resolves symbols with zero manual intervention—turning deep, on-demand analysis of production systems into a low-friction, always-available capability, and significantly reducing Mean Time to Resolution (MTTR).

What Is a Full-Stack Flame Graph and How Do You Use It to Diagnose Performance Bottlenecks?

Flame graphs are among the most intuitive outputs in dynamic tracing, but the depth of “full-stack” coverage varies significantly across tools. OpenResty XRay supports multiple sampling dimensions simultaneously and provides vertical visibility spanning the complete call chain—from the application layer all the way down to the kernel.

On-CPU flame graphs identify code paths where the program is actively consuming CPU time. They are ideal for diagnosing CPU spikes and abnormal throughput drops. Common findings include repeated regex compilation that bypasses the cache and accidentally retained debug code paths.
Off-CPU flame graphs capture wait time when a process is not scheduled on the CPU, surfacing root causes such as lock contention (e.g., sem_wait), file I/O blocking, network waits, or scheduler preemption. Many “slow requests, low CPU” problems trace their root cause directly to the off-CPU layer.
Memory allocation flame graphs and object reference flame graphs profile memory behavior: the former pinpoints high-frequency allocation hot paths; the latter visualizes object retention chains, making it possible to track down slow leaks within custom memory pools—leaks that bypass the system allocator and therefore evade tools like Valgrind.
File I/O flame graphs bring disk access patterns into the sampling scope, providing direct evidence when diagnosing latency caused by disk configuration or I/O behavior.

Vertically, OpenResty XRay flame graphs span the full software stack: from Lua or Java business logic, through Nginx/Envoy and data-plane middleware, down to the OS kernel and network stack, all the way to blocking rooted in disk configuration. This cross-layer visibility is especially critical for diagnosing issues where symptoms surface at the application layer but the root cause lives in the system layer.

The entire sampling process requires no code changes and no service restarts, and the overhead imposed on the target application is negligible.

How Is OpenResty XRay Expanding Its Observability Coverage Over Time?

OpenResty XRay’s architecture is designed to extend naturally to new technology stacks. The platform has added non-intrusive analysis capabilities for Java applications and Envoy—two technology stacks that are widely deployed in production yet notoriously difficult to instrument with traditional tracing tools.

How Do You Profile Java Applications Without Any Code Instrumentation?

Achieving function-level observability in Java production environments has always been a difficult engineering challenge. Every mainstream approach carries a cost: adding manual logging requires redeployment; Java Agents intercept at class-loading time, introducing startup overhead and potential compatibility conflicts; APM frameworks offer pre-defined instrumentation, not on-demand sampling. The more fundamental problem is that all three approaches require deployment decisions to be made before the fault ever occurs.

OpenResty XRay’s Java function probes bypass this constraint entirely—no bytecode modification, no Java Agent, no JVM restart. Through out-of-process sampling, the platform observes the running JVM in real time and retrieves function-level data on demand.

Memory analysis works the same way. Traditional heap dumps have two well-known shortcomings: they trigger stop-the-world pauses during heap serialization, and offline snapshots have limited value for diagnosing slow leaks or transient spikes. OpenResty XRay requires no JVM pause and no Safepoint dependency. It covers common memory issues across three dimensions:

GC object reference analysis: Reconstructs object retention paths to identify the root cause of unexpected memory residency
GC object allocation count analysis: Ranks code paths by allocation frequency to pinpoint high-frequency sources of GC pressure
GC object allocation size analysis: Identifies large object allocation sites to inform heap layout optimization

How Do You Diagnose Envoy Lua Performance Bottlenecks in Production?

Envoy handles significant traffic processing logic in modern service meshes. When performance issues emerge in its Lua extension layer, the diagnostic path is rarely obvious—Envoy’s own metrics expose only aggregated data, making it nearly impossible to trace the problem to specific Lua code paths.

OpenResty XRay provides real-time, Lua-level flame graph sampling for live Envoy instances, covering both CPU and off-CPU dimensions—no Envoy restart required, no configuration changes, deployable directly in core traffic environments. Combined with built-in intelligent diagnostics, the sampling results directly pinpoint the specific bottleneck code paths.

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

Try OpenResty XRay to analyze your running applications now

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty^® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty^®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.

How OpenResty XRay Enables Full-Stack Dynamic Tracing in Production

Where Do Existing Dynamic Tracing Frameworks Fall Short in Production?

How Did OpenResty XRay Rethink Dynamic Tracing at the Architecture Level?

Why Did We Build a Custom uretprobes Implementation to Avoid Kernel Stack Corruption?

YWhat Is the Y Language and How Does It Create a Unified Debugging Layer?

Live Analysis vs. Post-Mortem Analysis: How Does OpenResty XRay Handle Both?

What Is an Automated Debug Symbol Library and How Does It Work?

What Is a Full-Stack Flame Graph and How Do You Use It to Diagnose Performance Bottlenecks?

How Is OpenResty XRay Expanding Its Observability Coverage Over Time?

How Do You Profile Java Applications Without Any Code Instrumentation?

How Do You Diagnose Envoy Lua Performance Bottlenecks in Production?

Further Reading and Real-World Case Studies

What is OpenResty XRay

About The Author

One String Value Crashed Asia's Biggest Streaming Site — We Found It with OpenResty XRay

Binary Evidence-Driven Vulnerability Scanning: Eliminating False Positives with OpenResty XRay

Introduction to OpenResty XRay

Anatomy of a 15x Performance Drop: A 90k to 6k QPS Root Cause Analysis with OpenResty XRay

Building a Non-Invasive Java Function Profiler with OpenResty XRay

The Wonderland of Dynamic Tracing (Part 1 of 3)

How OpenResty and Nginx Shared Memory Zones Consume RAM

How OpenResty and Nginx Allocate and Manage Memory

OpenResty 1.31.1.1 Released

OpenResty 1.29.2.5 Released

OpenResty 1.29.2.4 released

Taming Internal Traffic Chaos with OpenResty Edge

OpenResty XRay Version 26.5.11. Now Available

The Swiss Army Knife for Dynamic Tracing, How Ylang Tames Complexity

Major Breakthrough for OpenResty XRay: Non-Intrusive Analysis of Java Application Memory & Envoy Lua Performance

Why Dynamic Tracing is the Future of Production Troubleshooting

Deep Dive into Dynamic Tracing Technology: How OpenResty XRay Revolutionizes Problem Diagnosis

The Next Stop for Observability: How OpenResty XRay Redefines Problem Troubleshooting

Where Do Existing Dynamic Tracing Frameworks Fall Short in Production?

How Did OpenResty XRay Rethink Dynamic Tracing at the Architecture Level?

Why Did We Build a Custom uretprobes Implementation to Avoid Kernel Stack Corruption?

YWhat Is the Y Language and How Does It Create a Unified Debugging Layer?

Live Analysis vs. Post-Mortem Analysis: How Does OpenResty XRay Handle Both?

What Is an Automated Debug Symbol Library and How Does It Work?

What Is a Full-Stack Flame Graph and How Do You Use It to Diagnose Performance Bottlenecks?

How Is OpenResty XRay Expanding Its Observability Coverage Over Time?

How Do You Profile Java Applications Without Any Code Instrumentation?

How Do You Diagnose Envoy Lua Performance Bottlenecks in Production?

Further Reading and Real-World Case Studies

What is OpenResty XRay

About The Author

Related Articles

The Swiss Army Knife for Dynamic Tracing, How Ylang Tames Complexity

Major Breakthrough for OpenResty XRay: Non-Intrusive Analysis of Java Application Memory & Envoy Lua Performance

Why Dynamic Tracing is the Future of Production Troubleshooting

Deep Dive into Dynamic Tracing Technology: How OpenResty XRay Revolutionizes Problem Diagnosis

The Next Stop for Observability: How OpenResty XRay Redefines Problem Troubleshooting

One String Value Crashed Asia's Biggest Streaming Site — We Found It with OpenResty XRay

Binary Evidence-Driven Vulnerability Scanning: Eliminating False Positives with OpenResty XRay

Introduction to OpenResty XRay

Anatomy of a 15x Performance Drop: A 90k to 6k QPS Root Cause Analysis with OpenResty XRay

Building a Non-Invasive Java Function Profiler with OpenResty XRay

The Wonderland of Dynamic Tracing (Part 1 of 3)

How OpenResty and Nginx Shared Memory Zones Consume RAM

How OpenResty and Nginx Allocate and Manage Memory

OpenResty 1.31.1.1 Released

OpenResty 1.29.2.5 Released

OpenResty 1.29.2.4 released

Taming Internal Traffic Chaos with OpenResty Edge

OpenResty XRay Version 26.5.11. Now Available

The Swiss Army Knife for Dynamic Tracing, How Ylang Tames Complexity

Major Breakthrough for OpenResty XRay: Non-Intrusive Analysis of Java Application Memory & Envoy Lua Performance

Why Dynamic Tracing is the Future of Production Troubleshooting

Deep Dive into Dynamic Tracing Technology: How OpenResty XRay Revolutionizes Problem Diagnosis

The Next Stop for Observability: How OpenResty XRay Redefines Problem Troubleshooting

Love to hear from you, Get in touch 👋

Message was sent successfully!