In this tutorial, you will learn how to analyze the quantitative memory distribution inside a web application based on Python’s Django framework. We can pinpoint the most memory-consuming Python objects via detailed data reference paths found by OpenResty XRay. OpenResty XRay will automatically generate and interpret the Python GC object memory distribution flame graphs invented by ourselves. You can use it to examine any Python object in detail, such as a string or a dictionary.

django.png

Problem: high memory usage

Run the ps command to check all Python3 processes. As shown, a process takes up 86MB of memory.

Screenshot

We can see it is the standard Python binary executable shipped with the Linux distribution.

Screenshot

Use the guidede analysis feature of OpenResty XRay to analyze the Django application

Let’s use OpenResty XRay to check out this unmodified process. We can analyze it in real time and figure out what’s going on.

Open the OpenResty XRay web console in the web browser.

Screenshot

Make sure it is the right machine you are watching.

Screenshot

You can choose the right machine from the list below if the current one is not correct.

Screenshot

Go to the “Guided Analysis” page.

Screenshot

Here you can see different types of problems that you can diagnose.

Screenshot

Let’s select “High memory usage”.

Screenshot

Click on “Next”.

Screenshot

Select the Python Django application.

Screenshot

Select the process that consumes 84MB of the memory resources. This is what we saw previously in ps.

Screenshot

Make sure that the application type is right.

Screenshot

Usually the default should be correct.

Screenshot

OpenResty XRay can analyze multiple language levels at the same time. We’ll keep both Python and C/C++ selected.

Screenshot

We can also set the maximum analyzing time. We’ll leave it as 300 seconds, which is the default value.

Screenshot

Let’s start analyzing.

Screenshot

The system will keep performing different rounds of analysis. Now it’s executing the first round.

Screenshot

The first round is done and it’s on to the second one already. That’s enough for this case.

Screenshot

Let’s stop analyzing.

Screenshot

It shows that the system is generating a report for the current analysis.

Screenshot

We can see it automatically created an analysis report.

Screenshot

This is the type of problem we diagnose. It’s “Memory”.

Screenshot

This data reference path shows that the the Python modules loaded in the interpreter use 38MB of memory.

Screenshot

Click “More” to see details.

Screenshot

The most significant data reference path was automatically inferred from this Python GC object memory distribution flame graph.

Screenshot

Below are more explanations and suggestions about the issue.

Screenshot

It mentions .modules.

Screenshot

And mentions that it contains everything related to Python modules.

Screenshot

Look at the flame graph again, zoom in and find out which modules are taking up the most memory.

Screenshot

We can see these 1521 Python modules are combined together. Each of these modules takes less than 1% of the total memory.

Screenshot

This module takes up a lot of memory. To see more details, click on it to zoom in.

Screenshot

You can see that the openpyxl.utils.cell module takes up more than 2.6MB of memory. openpyxl is a Python library for manipulating Excel files.

Screenshot

Copy the module name.

Screenshot

Now open the terminal and use the find command to find the corresponding source code file in this project directory. The project folder is the root directory of the Python module installation location.

Screenshot

Paste the module name we just copied.

Screenshot

Screenshot

Here we abuse the dots in the module name as a wildcard for the grep command.

Screenshot

Copy the full file path.

Screenshot

Use the vim editor to open the source file. And look at the underlying Python code in this file. You can use any editors you like.

Screenshot

Return to the flame graph.

The openpyxl module uses two dictionaries, _COL_STRING_CACHE and _STRING_COL_CACHE. They are used for converting between Excel cell coordinates and column names. These two variables use a lot of memory.

Screenshot

Copy the name of the first variable.

Screenshot

Search for _COL_STRING_CACHE and you can locate the object. You can find all places in this file where this variable is used.

Screenshot

The _STRING_COL_CACHE variable can also be found using the same way.

Screenshot

You can use the same method to analyze other modules on the flame graph.

Such as the linecache module. This is a Python standard module that caches file contents. It uses 648KB of memory.

Screenshot

It has a cache variable that uses a lot of memory inside.

Screenshot

Python uses the Libc allocator and the mmap syscall to allocate memory. The report shows that the Libc allocator uses 18.7MB of memory.

Screenshot

Automatic analysis and reports

OpenResty XRay can also monitor online processes automatically and show analysis reports.

Go to the “Insights” page.

Screenshot

You can find the reports in the “Insights” page for daily and weekly periods.

Screenshot

For this reason, you don’t have to use the “Guided Analysis” feature. Guided analysis is useful for application development and demonstration purposes.

Screenshot

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.