In this tutorial, you will learn how to analyze the quantitative memory distribution inside a web application based on Python’s Django framework. We can pinpoint the most memory-consuming Python objects via detailed data reference paths found by OpenResty XRay. OpenResty XRay will automatically generate and interpret the Python GC object memory distribution flame graphs invented by ourselves. You can use it to examine any Python object in detail, such as a string or a dictionary.
ps command to check all Python3 processes. As shown, a process takes up 86MB of memory.
We can see it is the standard Python binary executable shipped with the Linux distribution.
Let’s use OpenResty XRay to check out this unmodified process. We can analyze it in real time and figure out what’s going on.
Open the OpenResty XRay web console in the web browser.
Make sure it is the right machine you are watching.
You can choose the right machine from the list below if the current one is not correct.
Go to the “Guided Analysis” page.
Here you can see different types of problems that you can diagnose.
Let’s select “High memory usage”.
Click on “Next”.
Select the Python Django application.
Select the process that consumes 84MB of the memory resources. This is what we saw previously in
Make sure that the application type is right.
Usually the default should be correct.
OpenResty XRay can analyze multiple language levels at the same time. We’ll keep both Python and C/C++ selected.
We can also set the maximum analyzing time. We’ll leave it as 300 seconds, which is the default value.
Let’s start analyzing.
The system will keep performing different rounds of analysis. Now it’s executing the first round.
The first round is done and it’s on to the second one already. That’s enough for this case.
Let’s stop analyzing.
It shows that the system is generating a report for the current analysis.
We can see it automatically created an analysis report.
This is the type of problem we diagnose. It’s “Memory”.
This data reference path shows that the the Python modules loaded in the interpreter use 38MB of memory.
Click “More” to see details.
The most significant data reference path was automatically inferred from this Python GC object memory distribution flame graph.
Below are more explanations and suggestions about the issue.
And mentions that it contains everything related to Python modules.
Look at the flame graph again, zoom in and find out which modules are taking up the most memory.
We can see these 1521 Python modules are combined together. Each of these modules takes less than 1% of the total memory.
This module takes up a lot of memory. To see more details, click on it to zoom in.
You can see that the
openpyxl.utils.cell module takes up more than 2.6MB of memory.
openpyxl is a Python library for manipulating Excel files.
Copy the module name.
Now open the terminal and use the
find command to find the corresponding source code file in this project directory. The project folder is the root directory of the Python module installation location.
Paste the module name we just copied.
Here we abuse the dots in the module name as a wildcard for the
Copy the full file path.
Use the vim editor to open the source file. And look at the underlying Python code in this file. You can use any editors you like.
Return to the flame graph.
openpyxl module uses two dictionaries,
_STRING_COL_CACHE. They are used for converting between Excel cell coordinates and column names. These two variables use a lot of memory.
Copy the name of the first variable.
_COL_STRING_CACHE and you can locate the object. You can find all places in this file where this variable is used.
_STRING_COL_CACHE variable can also be found using the same way.
You can use the same method to analyze other modules on the flame graph.
Such as the
linecache module. This is a Python standard module that caches file contents. It uses 648KB of memory.
It has a
cache variable that uses a lot of memory inside.
Python uses the Libc allocator and the mmap syscall to allocate memory. The report shows that the Libc allocator uses 18.7MB of memory.
OpenResty XRay can also monitor online processes automatically and show analysis reports.
Go to the “Insights” page.
You can find the reports in the “Insights” page for daily and weekly periods.
For this reason, you don’t have to use the “Guided Analysis” feature. Guided analysis is useful for application development and demonstration purposes.
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.