In this tutorial, you will learn how to quickly identify the largest Python memory objects or values in your applications. Thanks to the power of OpenResty XRay; we can pinpoint these garbage-collectible objects via detailed data reference paths. We’ll also demonstrate the Python GC object memory distribution flame graphs invented by ourselves. Memory leaks or large memory footprints are no longer a myth in your Python code!

Problem: high memory usage

Let’s run the top command to check the memory usage of each process. As shown, the Python process gunicorn is consuming more than 600MB of memory.

Screenshot

Let’s run the ps command to see more details about this process.

Screenshot

Here we can see that it is using the standard Python 3 binary executable that comes with the Linux distribution.

Screenshot

Use the guided analysis feature of OpenResty XRay to find the largest Python objects or values taking the most RAM

Let’s use OpenResty XRay to check out this unmodified process. We can analyze it in real time and figure out what’s going on. Open the OpenResty XRay web console in the web browser.

Screenshot

Make sure it is the right machine you are watching.

Screenshot

You can choose the right machine from the list below if the current one is not correct.

Screenshot

Go to the “Guided Analysis” page.

Screenshot

Here you can see different types of problems that you can diagnose.

Screenshot

Let’s select “High memory usage”.

Screenshot

Click on “Next”.

Screenshot

Select the Python application.

Screenshot

Select the process that consumes almost 600 MB of the RAM. This is what we saw previously in top.

Screenshot

Make sure that the application type is right. Usually the default should be correct.

Screenshot

OpenResty XRay can analyze multiple language levels at the same time. We’ll keep both Python and C/C++ selected.

Screenshot

We can also set the maximum analyzing time. We’ll leave it as 300 seconds, which is the default value.

Screenshot

Let’s start analyzing.

Screenshot

The system will keep performing different rounds of analysis. Now it’s executing the first round.

Screenshot

The first round is done and it’s on to the second one already. That’s enough for this case.

Screenshot

Let’s stop analyzing now.

Screenshot

We can see it automatically creates a analysis report.

Screenshot

This is the type of problem we are going to diagnose. It’s memory.

Screenshot

We can see most of the memory is allocated by the libc allocator. It’s more than 580 MB.

Screenshot

This is the Python GC object reference path occupying the most memory. Obviously, this Python VM uses the libc allocator to request memory.

Screenshot

This is the Python dictionary holding all the loaded Python modules.

Screenshot

Here is the Python module named order_service.service.order.prev_processor.

Screenshot

Inside this module, there is a field named order_name_cache.

Screenshot

And this field value is a Python dictionary.

Screenshot

Click to see more details.

Screenshot

This GC object reference path is automatically inferred from this Python-land GC object memory distribution flame graph.

Screenshot

Below are more detailed explanations and suggestions regarding the current issue. It talks about the prev_processor module we saw earlier.

Screenshot

It also mentions order_name_cache dictionary.

Screenshot

Let’s go back to the data reference path. Click the icon to copy the module name.

Screenshot

On the terminal, use the find command to find the Python source file.

Screenshot

Paste the module name we just copied. Here we abuse the dots in the module name as a wildcard for the grep command.

Screenshot

Copy the full file path. Use the vim editor to open the source file. You can use any editor you like.

Screenshot

We can find the order_name_cache field variable mentioned in the previous report.

Screenshot

We can also examine how this field variable is being used, to determine if there is a memory leak.

Screenshot

Automatic analysis and reports

OpenResty XRay can also monitor online processes automatically and show analysis reports.

Screenshot

Go to the “Insights” page.

Screenshot

You can find the reports in the Insights page for daily and weekly periods. For this reason, you don’t have to use the “Guided Analysis” feature.

Screenshot

Though “Guided analysis” is useful for application development and demonstration purposes.

Screenshot

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.