How to use OpenResty XRay to quickly pinpoint memory leaks in C++ process
At critical junctures, the stability of core services directly impacts business success or failure. Recently, one of our clients encountered a persistent issue that gave their operations team a major headache: the Nginx service’s memory usage was ballooning uncontrollably, much like a “black hole” devouring resources. While temporary restarts offered brief relief, the problem always resurfaced. In this article, we will share how to leverage OpenResty XRay to swiftly discover and pinpoint this elusive memory leak. We’ll also demonstrate how to transform the complex and time-consuming memory leak troubleshooting process into a repeatable, closed-loop methodology of “diagnose → plan → verify.”
Technical Predicament and Initial Diagnosis
One of our clients’ core Nginx services experienced continuously rising memory usage, akin to a “memory black hole.” Although temporary restarts offered brief relief, the problem quickly resurfaced.
Challenges and Risks:
- Technical Perspective: A C/C++ process memory leak in a production environment is notoriously one of the most challenging issues to resolve. Traditional debugging tools (like GDB) cannot be directly deployed on live systems, and code auditing is akin to finding a needle in a haystack—time-consuming and inefficient.
- Business Impact: If left unaddressed, memory consumption would continue to escalate, leading to slow responses, frequent crashes, and ultimately jeopardizing core business continuity. This would result in: degraded user experience, gradual user churn, damage to brand reputation, and direct economic losses.
Relying on scheduled restarts to “keep the system alive” is merely a stopgap measure, akin to drinking poison to quench thirst, leaving the entire business perpetually exposed to risk. Upon receiving the request for assistance, we immediately utilized our dynamic tracing product, OpenResty XRay, to directly analyze memory allocation in the production environment. Within minutes, the tool automatically generated a memory leak flame graph, providing a clear breakthrough for troubleshooting the issue.
The flame graph clearly illustrated that memory allocation was primarily concentrated in three areas:
- Nginx memory pool
- TLS-related memory
ngx_dubbo_modulefunction
Both the Nginx memory pool and TLS are well-established, mature components with an extremely low probability of memory leaks. Therefore, we swiftly narrowed our analysis to the most suspect component: the ngx_dubbo_module function.
Flame Graphs: Pinpointing Memory Leak Hotspots
Step One: Identify the Culprit Function.
Leveraging the powerful dynamic tracing capabilities of OpenResty XRay, we can efficiently dissect problems with surgical precision, eliminating the need to sift through massive logs or conduct exhaustive code audits. Our focus then narrows down to the most suspect ngx_dubbo_module function.
The memory leak flame graph clearly indicates that the peak of memory allocation points directly to the ngx_dubbo_hessian2_encode_payload_map function, marking it as the primary hotspot for the memory leak.
Step Two: Delve into the Code to Uncover Anomalies.
Upon locating the function’s source code, we quickly identified the following suspicious areas:
ngx_int_t ngx_dubbo_hessian2_encode_payload_map(ngx_pool_t *pool, ngx_array_t *in, ngx_str_t *out)
{
try {
// ...
for (size_t i=0; i<in->nelts; i++) {
string key((const char*)kv[i].key.data, kv[i].key.len);
if (0 == (key.compare("body"))) {
// Here, an object is created using `new`
ByteArray *tmp = new ByteArray((const char*)kv[i].value.data, kv[i].value.len);
ObjectValue key_obj(key);
ObjectValue value_obj((Object*)tmp);
// The object is placed into a Map, but no `delete` operation is subsequently observed
strMap.put(key_obj, value_obj);
} else {
// ...
}
}
// ...
} catch (...) {
// ...
}
}
In the code, a tmp object is created using new ByteArray, but upon reviewing the entire codebase, we found no corresponding delete operation. Our intuition suggests this is where the problem lies. After the tmp object is wrapped as an ObjectValue, it is passed to the strMap.put method. Who is responsible for managing its lifecycle?
Step Three: Tracing the Object’s Lifecycle to its Source.
To uncover the truth, we must delve into the implementation details of the hessian2 library.
First, the tmp pointer is used to construct an ObjectValue: ObjectValue value_obj((Object*)tmp);. Its corresponding constructor is quite simple; it merely stores the pointer and does not involve any smart pointers or ownership transfer operations.
// ObjectValue Constructor
ObjectValue(Object* obj) : _type(OBJ) { _value.obj = obj; }
Next, value_obj is passed to the strMap.put method. Internally, this method calls value.get_object() to retrieve the original object pointer:
// Map::put Method Implementation
void Map::put(const ObjectValue& key, const ObjectValue& value,
bool chain_delete_key, bool chain_delete_value) {
pair<Object*, bool> ret_key = key.get_object();
pair<Object*, bool> ret_value = value.get_object();
// ...
// The boolean value returned by `get_object` determines whether the object is added to the "deletion chain"
if (ret_value.second || chain_delete_value) {
_delete_chain.push_back(ret_value.first);
}
_map[ret_key.first] = ret_value.first;
}
A Map contains an internal _delete_chain for managing memory that needs to be freed. An object’s inclusion in this “delete chain” depends on the bool value returned by get_object(), or the chain_delete_value parameter passed to the put method.
Now, let’s examine the crucial ObjectValue::get_object method:
// ObjectValue::get_object Method Implementation
pair<Object*, bool> ObjectValue::get_object() const {
switch (_type) {
// Our `tmp` object is of type `OBJ`
case OBJ: return pair<Object*, bool>(_value.obj, false);
case C_OBJ: return pair<Object*, bool>(_value.obj, false);
// Other types will create a new object using `new` and return `true`
case IVAL: return pair<Object*, bool>(new Integer(_value.ival), true);
// ...
default: return pair<Object*, bool>(NULL, false);
}
}
The root cause has been identified! When _type is OBJ, the get_object method returns false. This signals to the caller: “You are not responsible for managing this pointer; I have not allocated new memory.” However, our tmp object was, in fact, allocated externally using new.
Because get_object() returns false, and chain_delete_value was not explicitly provided during the put method call (it defaults to false), the condition if (ret_value.second || chain_delete_value) evaluates to false. Consequently, the tmp object was never added to the _delete_chain. As requests are processed, countless ByteArray objects are created but never deallocated, leading directly to memory leaks.
From Reactive to Proactive: XRay’s New Approach to Troubleshooting
In the pursuit of system stability, many teams find themselves caught in a vicious cycle: to solve problems, they collect more data for more clues; to collect more data, they add more monitoring metrics, build more complex dashboards, and invest huge budgets in additional monitoring tools. The result is an overwhelming amount of fragmented information, with data proliferating while the signal-to-noise ratio plummets. Engineers are left sifting through a “data graveyard,” searching for valuable insights like prospectors panning for gold.
The fundamental flaw of this model is its failure to adequately connect technical problems, such as memory leaks, with their true business costs:
- A memory leak alert, seemingly just a fluctuation in technical metrics, actually represents a significant drain on engineers’ valuable time.
- A frequent production incident not only threatens system stability but also carries the potential for business disruption.
- Continuous, high monitoring investment often fails to yield commensurate actionable insights.
Ultimately, the issue isn’t “too little data,” but “too few useful answers.” A truly efficient observability tool should act like a “scalpel,” directly pinpointing the root cause of a problem, rather than handing teams another “shovel” and forcing them to endlessly dig through data day and night. This is precisely the core challenge that the next-generation observability platform aims to solve:
- Moving from surface-level metrics to causal chains
- Shifting from passive alerts to active analysis and insights
- Evolving from consuming human resources to truly unlocking R&D productivity
The OpenResty XRay Closed-Loop Approach
In this context, OpenResty XRay introduces a transformative methodology: converting complex, time-consuming processes into a repeatable, self-sustaining cycle. We define this as “Diagnose → Plan → Verify”.
- Pinpoint Accuracy: Traditional methods often involve “sifting through more data, hoping to find a clue.” XRay’s approach is fundamentally different. Utilizing advanced visualization tools like flame graphs, it directly identifies the problem’s location, enabling teams to uncover the root cause within minutes. It delivers not just more monitoring metrics, but critical, actionable insights that drive immediate resolution.
- Evidence-Based Solutioning: When a problem is precisely traced to a specific line of code or a particular C/C++ process, the remediation plan becomes direct and targeted. Teams can bypass iterative trial and error, instead formulating clear, effective action steps based on irrefutable evidence.
- Instant Validation for a Complete Cycle: After a fix is deployed, XRay can re-analyze the system to confirm the issue has been thoroughly resolved. This rapid validation mechanism ensures a truly closed-loop process: diagnosis, action, and verification are seamlessly integrated.
The value derived from this closed-loop capability extends far beyond resolving individual bugs:
- Technical Empowerment: It empowers teams to non-invasively gain deep insights into C/C++ processes within production environments, simplifying and visualizing inherently complex issues.
- Strategic Business Impact:
- Enhanced Business Stability: Proactively eliminates potential failures, ensuring robust service high availability.
- Accelerated R&D Efficiency: Frees engineers from tedious, inefficient debugging cycles, allowing them to focus on higher-value business innovation.
- Optimized Resource Utilization: Prevents over-provisioning of resources due to unidentified performance bottlenecks, leading to genuine cost control.
In an era of data overload, technical teams no longer need just more monitoring dashboards. What they truly require is a smarter, more dependable approach to quickly identify and resolve problems. This, fundamentally, is the true power of observability.
What is OpenResty XRay
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.


















