In the era of big data, the prevailing belief that “data is the answer” has led to the construction of massive observability data lakes. However, this approach is now revealing its fundamental flaws:

  • Spiraling Costs: Every additional monitoring dimension leads to an exponential increase in storage, network, indexing, and querying costs. This budget could otherwise be allocated to higher-value business innovation.
  • Extremely Low Signal-to-Noise Ratio: Within vast amounts of data, 99.9% consists of redundant information describing “normal” system behavior. Truly valuable anomalies are like a needle in a haystack, incredibly difficult to find.
  • Team Burnout: SRE and DevOps teams are forced to dedicate extensive time to configuring collection rules, maintaining data pipelines, and mastering complex query languages, effectively becoming “data janitors” rather than problem solvers.

More data does not equate to faster problem identification. Instead, it often leads to higher costs and deeper confusion. What we truly need is not “more data,” but “just the right data.”

OpenResty XRay’s core mission is to collect the right data, at the right time, with the right precision, thereby unlocking the deepest insights at minimal cost.

“More Data” Doesn’t Mean “More Insight”

The core of most observability products is to help users understand a system’s internal state, which typically requires collecting and analyzing vast amounts of data, such as logs, metrics, and traces. This approach to data collection necessitates high storage and computing costs, and simultaneously imposes a performance overhead on the business systems themselves, all without guaranteeing faster root cause analysis (RCA). Enterprises often end up with “more data” rather than “deeper insight.”

OpenResty XRay employs an intelligent, on-demand approach to dynamically adjust data collection granularity, designed to balance cost, performance, and insight. The system operates in a lightweight mode with extremely low overhead under normal conditions. Upon detecting an anomaly, it automatically increases the sampling rate, zooming in on problem details until the root cause is pinpointed, thereby forming a closed loop from monitoring to diagnosis.

The advantages of this strategy include:

  • Cost-effectiveness: Significantly reduces data storage and processing costs, as you no longer need to pay unnecessary expenses for irrelevant data.
  • Performance Optimization: Lowers the load of data transmission and processing, mitigating the performance impact on the monitored system.
  • Focused Insight: Helps engineers concentrate their attention on the most critical and insightful data, leading to faster problem resolution.

The OpenResty XRay difference

FeatureTraditional Observability ProductsOpenResty XRay
Data VolumeMassive: Continuously collects everything, often leading to data overload.Minimal: Collects high-value data only when anomalies occur, on an as-needed basis.
CostVery High: Storage, network, and compute costs constantly escalate.Very Low: Performance overhead is typically less than 5%.
Diagnosis SpeedSlow: Requires manual correlation, querying, and filtering of vast amounts of data.Fast: Automatically triggered, intelligently analyzed, leading directly to the root cause.
Root Cause AccuracyLow: Often identifies symptoms, but struggles to pinpoint specific code lines or configurations.High: Pinpoints issues precisely down to the code line, function, and system call level.
Human DependencyHigh: Heavily relies on the SRE team’s individual experience and query skills, driving up human capital costs.Very Low: Fully automated operation with expert system support, optimizing budget and human resources.

Powered by Dynamic Tracing Technology

OpenResty XRay is not merely an incremental upgrade to traditional monitoring tools. It leverages advanced dynamic tracing technology to empower technical decision-makers and frontline teams to find clarity amidst complexity:

  • Achieve the most credible root cause insights with minimal data cost: Move beyond data overload and get straight to the heart of the problem.
  • Fully automated operation, productizing expert knowledge: Eliminates the need for additional human intervention, as the system autonomously handles the entire process from anomaly detection to root cause analysis.
  • Intelligent triggering, capturing every critical moment: Precisely identifies both persistent performance bottlenecks and transient spikes.
  • Contextualized reasoning, penetrating from symptoms to root causes: Emulates experienced experts, progressively narrowing down the investigation scope to identify the ultimate source of issues.

Our function probes are hot-swappable, requiring no modifications, pre-configurations, or restarts of the target application. During sampling, the performance overhead on the application is strictly controlled at a very low level, typically less than 5%, making it completely imperceptible to end-users. Once the tool is deactivated, the target application’s performance immediately returns to full speed.

Intelligent Scheduling at its Core: How XRay Automatically Investigates the “Scene of the Crime”

At the heart of OpenResty XRay’s capabilities lies a fully autonomous, unattended intelligent scheduling system. It fundamentally moves beyond the “needle in a haystack” passive analysis approach of traditional monitoring, replacing it with an automated workflow that mimics the diagnostic logic of top experts. Its essence can be summarized as: Routine Monitoring → Event-Driven → Chained Inference.

Chained Analyzers

Phase One: Routine Monitoring

During 99% of the system’s normal operating uptime, OpenResty XRay performs scheduled lightweight sampling of key indicators with extremely low frequency and overhead. This process, which we call “Routine Monitoring,” has virtually zero impact on business performance. Thanks to the non-invasiveness of dynamic tracing, the overhead is typically less than 5%, often so low as to be immeasurable, ensuring we maintain continuous awareness of system health at virtually zero cost.

Our sampling strategy is intelligently driven by events, timers, and critical system metrics. This means that even elusive issues of extremely short duration or very low frequency—such as high-latency requests occurring as rarely as one in ten thousand—can be reliably captured, with all necessary contextual details automatically gathered for resolution.

Phase Two: Anomaly Triggering

Once “Routine Monitoring” detects anomalous signals, or when system indicators like soaring CPU usage, increased memory pressure, or request latency exceeding thresholds trigger alarms, OpenResty XRay’s event-driven engine immediately activates, switching from lightweight mode to deep analysis mode.

We firmly adhere to a core principle: All critical evidence must be captured in real-time at the “scene of the crime” as the problem occurs. Because once the problem has vanished, analyzing the target process is often meaningless. This event-driven mechanism ensures that OpenResty XRay always takes action at the precise moment a problem arises, ensuring no transient anomaly is ever missed.

Phase Three: Chained Reasoning

This is precisely what fundamentally distinguishes OpenResty XRay from other tools, and it’s the core of its “intelligence”. When an anomaly is triggered, instead of simply running a fixed analyzer, it initiates a series of chained reasoning intelligent decision-making processes, dynamically determining “when, on which process, and what analyzer to run”:

  1. Preliminary Diagnosis: First, based on the type of triggered event, the system automatically selects the most suitable general analyzer for a quick scan to establish a high-level overview of the problem.
  2. In-depth Analysis: Next, XRay analyzes the results of the initial diagnosis and, based on these findings, intelligently selects the next specialized tool to deploy.
    • For example: If a CPU flame graph indicates a significant amount of Off-CPU time, the system automatically switches to the Off-CPU analyzer to investigate whether the underlying cause is I/O blocking or lock contention.
    • Another example: If an abnormal proportion of a specific memory allocation function is detected, it will then schedule the memory analysis toolset for a deep dive into memory leaks or allocation hotspots.
  3. Root Cause Identification: This iterative “analyze-decide-reanalyze” process automatically continues like a chain, progressively narrowing down the scope of investigation, until it ultimately pinpoints the specific line of code, function call, system configuration, or kernel parameters causing the issue.

This intelligent scheduling strategy, driven by events, timers, and system metrics, ensures that OpenResty XRay always operates “at the right time, on the right process, with the right analyzer”, thereby achieving truly unattended, precise insight.

Automated Root Cause Insights

OpenResty XRay provides more than just raw data. It automatically samples and generates various types of flame graphs, including C/C++ flame graphs, Lua flame graphs, off-CPU and on-CPU flame graphs, dynamic memory allocation flame graphs, memory object reference flame graphs, file I/O flame graphs, etc. It supports cross-language, multi-layer deep-dive analysis for performance issues across C/C++, Lua, Java, Python, and other mixed-language stacks.

Flame Graph Sample

Interpreting flame graphs is often a common bottleneck for users. Leveraging our proprietary AI technology, we automatically generate detailed root cause analysis reports. This transforms complex performance charts into intuitive text explanations, pinpointing the crux of the problem for you.

Screenshot

Proactive Analysis Mode: From Reactive Response to Proactive Optimization

OpenResty XRay is not just a post-mortem diagnostic tool. Its “guided analysis” feature supports interactive real-time debugging of applications running in production, eliminating the need for a lengthy “modify-compile-release” cycle. This means you can instantly observe performance changes resulting from debugging operations and quickly validate your modifications, all without impacting the production environment.

XRay Real-world Use Cases

Case Study 1: OpenResty XRay Helps a Customer Resolve Memory Leak Issues

  • Customer Background: A core business system of an enterprise was experiencing continuously growing memory consumption, which severely threatened the stability and availability of its services.

  • Pain Point: Traditional performance analysis tools were unable to pinpoint the root cause of the problem, leading the team to spend a significant amount of time on troubleshooting without being able to identify the specific code causing the memory leak. They urgently needed a solution that could accurately locate and quickly resolve issues to prevent service downtime.

  • Results: By leveraging OpenResty XRay, the technical team precisely identified the source of the memory leak within just a few hours—a Lua closure function that abnormally consumed memory under specific conditions. The team completed the diagnosis and fix without requiring a service restart. Following the fix, system memory usage immediately dropped by 60%, service stability significantly improved, and the entire process had virtually no impact on the production environment.

Case Details: Memory Reduced by 60%: OpenResty XRay Pinpoints Problematic Code for Rapid Fix and Deployment

Case Study 2: OpenResty XRay Successfully Addresses Bilibili’s Critical Online Outage

  • Customer Background: Bilibili’s core API gateway experienced a sudden, large-scale production incident, impacting millions of users.

  • Pain Point: During this incident, the API gateway generated a high volume of 499 errors. Traditional monitoring tools could only identify the symptoms of the problem but lacked the capability to drill down to the code level to pinpoint the root cause of premature connection closures under high concurrency. The team urgently needed a tool that could quickly analyze and resolve underlying component issues to prevent the impact from spreading further.

  • Outcome: Leveraging OpenResty XRay’s dynamic tracing capabilities, Bilibili’s technical team swiftly uncovered the root cause: an edge case within OpenResty’s cosocket connection pool implementation. XRay not only provided detailed execution paths and variable states but also enabled the team to directly and efficiently implement the fix, successfully resolving this major incident affecting millions of users and fully demonstrating XRay’s immense value in emergency troubleshooting.

Detailed Account: OpenResty XRay Analysis and Resolution of Bilibili’s Major Online Incident

Case Study 3: OpenResty XRay Resolves CPU Performance Bottleneck Caused by Kong Plugin

  • Customer Background: An enterprise leveraged Kong API Gateway to manage its core business traffic. However, the system began experiencing unusually high CPU utilization, severely impacting service performance.

  • Pain Point: The team’s initial analysis indicated the problem stemmed from a custom Kong plugin. Yet, pinpointing the exact code segment or function responsible for the significant CPU resource consumption proved challenging. They faced a critical performance bottleneck and urgently needed an effective method to identify and optimize the issue.

  • Results: Utilizing OpenResty XRay’s CPU profiling capabilities, the team swiftly identified the root cause: a seemingly innocuous string.lower() function. A deeper analysis revealed that this function, when processing specific non-ASCII characters, would trigger a Lua exception. This led to the frequent execution of the exception handling path, ultimately creating the CPU bottleneck. Following the fix, system CPU usage immediately dropped by 40%, and service response time decreased by 30%. This case perfectly illustrates XRay’s unique strengths in resolving complex Lua application performance issues.

Specific Details: How we solved a CPU bottleneck caused by a Lua exception in a custom Kong plugin

Conclusion

In today’s highly complex distributed systems, performance and stability challenges can no longer be overcome by simply amassing vast amounts of monitoring data. What we truly need are smarter, more efficient strategies.

OpenResty XRay fundamentally redefines the approach through its core dynamic tracing technology, enabling you to gain the deepest root cause insights with minimal data overhead.

It’s more than just a tool; it’s a modern operational and development philosophy. It empowers your team to find and resolve problems faster, fundamentally reduce costs, and enhance overall engineering efficiency, allowing engineers to focus on the essence of innovation.

If you are overwhelmed by data and seeking a professional, highly efficient performance diagnosis and assurance tool, we invite you to try OpenResty XRay.

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.