Find the largest Python objects or values taking the most RAM (using OpenResty XRay)
In this tutorial, you will learn how to quickly identify the largest Python memory objects or values in your applications. Thanks to the power of OpenResty XRay; we can pinpoint these garbage-collectible objects via detailed data reference paths. We’ll also demonstrate the Python GC object memory distribution flame graphs invented by ourselves. Memory leaks or large memory footprints are no longer a myth in your Python code!
Problem: high memory usage
Let’s run the top
command to check the memory usage of each process. As shown, the Python process gunicorn
is consuming more than 600MB of memory.
Let’s run the ps
command to see more details about this process.
Here we can see that it is using the standard Python 3 binary executable that comes with the Linux distribution.
Use the guided analysis feature of OpenResty XRay to find the largest Python objects or values taking the most RAM
Let’s use OpenResty XRay to check out this unmodified process. We can analyze it in real time and figure out what’s going on. Open the OpenResty XRay web console in the web browser.
Make sure it is the right machine you are watching.
You can choose the right machine from the list below if the current one is not correct.
Go to the “Guided Analysis” page.
Here you can see different types of problems that you can diagnose.
Let’s select “High memory usage”.
Click on “Next”.
Select the Python application.
Select the process that consumes almost 600 MB of the RAM. This is what we saw previously in top
.
Make sure that the application type is right. Usually the default should be correct.
OpenResty XRay can analyze multiple language levels at the same time. We’ll keep both Python and C/C++ selected.
We can also set the maximum analyzing time. We’ll leave it as 300 seconds, which is the default value.
Let’s start analyzing.
The system will keep performing different rounds of analysis. Now it’s executing the first round.
The first round is done and it’s on to the second one already. That’s enough for this case.
Let’s stop analyzing now.
We can see it automatically creates a analysis report.
This is the type of problem we are going to diagnose. It’s memory.
We can see most of the memory is allocated by the libc allocator. It’s more than 580 MB.
This is the Python GC object reference path occupying the most memory. Obviously, this Python VM uses the libc allocator to request memory.
This is the Python dictionary holding all the loaded Python modules.
Here is the Python module named order_service.service.order.prev_processor
.
Inside this module, there is a field named order_name_cache
.
And this field value is a Python dictionary.
Click to see more details.
This GC object reference path is automatically inferred from this Python-land GC object memory distribution flame graph.
Below are more detailed explanations and suggestions regarding the current issue. It talks about the prev_processor
module we saw earlier.
It also mentions order_name_cache
dictionary.
Let’s go back to the data reference path. Click the icon to copy the module name.
On the terminal, use the find
command to find the Python source file.
Paste the module name we just copied. Here we abuse the dots in the module name as a wildcard for the grep
command.
Copy the full file path. Use the vim editor to open the source file. You can use any editor you like.
We can find the order_name_cache
field variable mentioned in the previous report.
We can also examine how this field variable is being used, to determine if there is a memory leak.
Automatic analysis and reports
OpenResty XRay can also monitor online processes automatically and show analysis reports.
Go to the “Insights” page.
You can find the reports in the Insights page for daily and weekly periods. For this reason, you don’t have to use the “Guided Analysis” feature.
Though “Guided analysis” is useful for application development and demonstration purposes.
What is OpenResty XRay
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.