Pinpointing the Python Code Paths with High Disk I/O (using OpenResty XRay)
In this tutorial, you will learn how to use OpenResty XRay to quantitatively analyze high disk I/O issues in an online Python application. With the Python-level disk read/write count, latency, and throughput flame graphs generated by OpenResty XRay, you can identify the Python code paths with the most significant disk latency and read/write volume. Plus, it offers precise line-by-line code insights to locate the problem’s root cause swiftly and guide optimizations.
Problem: high disk I/O
On the OpenResty XRay dashboard, we discovered a Python application named `order-service'. After analyzing, XRay found that it has problems of disk I/O.
First, run the ps
command to check this application.
We see the full command line here.
Let’s use OpenResty XRay to check out this unmodified process. We can analyze it in real time and figure out what’s going on.
Spot the problematic Python code paths
Switch back to OpenResty XRay’s web console. Make sure it is the right machine you are watching.
You can choose the right machine from the list below if the current one is not correct.
Go to the “Guided Analysis” page.
Here, you can see different types of problems that you can diagnose.
Let’s select “High disk IO”.
Click on “Next”.
Select the Go application named `chat-service'.
Select the process that we saw previously in ps
.
Make sure that the application type is right. Usually, the default should be correct.
OpenResty XRay can analyze multiple language levels at the same time. We’ll keep both Python and C/C++ selected.
We can also set the maximum analyzing time. We’ll leave it as 300 seconds, which is the default value.
Let’s start analyzing.
The system will keep performing different rounds of analysis. Now, it’s executing the first round.
The first round is done, and it’s already on to the second one. That’s enough for this case.
Let’s stop analyzing.
We can see it automatically creates an analysis report.
This is the type of problem we diagnose. It’s “Disk I/O”.
This is the analysis of hard disk read count.
This code path has the most frequent disk read operations.
__read
is a function for reading data from a file.
download_file
is a business-level function for downloading files.
Click to see more details.
This code path was automatically derived from this Python-land disk read count flame graph.
Below are more detailed explanations and suggestions regarding the current issue. It explains the __read
function we saw earlier.
Let’s go back to the code path. Hover the mouse over the green box for this function.
We can see its source file and the full path for this file in the tooltip.
The source line number is 10.
Click the icon to copy the full Python source file path for this function.
Use the vim editor to open the source file. And look at the Python code in this file. You can use any editors you like.
Check line 10, as OpenResty XRay suggested.
This line of code is reading data from a file.
This is the analysis of the disk read throughput.
It is the same code path that handles file downloads.
The reading data rate is close to 269MB/s.
This is the analysis of disk read latency.
This code path is identical to the one we previously examined. It handles file downloads.
It is the sole source of hard disk read latency.
These are files with the most number of disk reads.
The file with the highest reading data volume rate is pic-ocean-wave.jpg
.
The reading data rate is 193MB/s.
These are files with the most accumulated read latency.
The file with the highest reading data latency is pic-ocean-wave.jpg
.
The proportion is about 75%.
This is the analysis of the disk write count.
This code path is the sole source of disk write operations.
write
is the function that writes data to a file.
flush
is a function in the Python logging module that ensures all log messages have been completely written.
emit
is a function in the logging module that handles logging records.
Automatic analysis and reports
OpenResty XRay can also monitor online processes automatically and generate analysis reports.
Go to the “Insights” page.
You can find the automatic reports on the “Insights” page for daily and weekly periods. For this reason, you don’t have to use the Guided Analysis feature.
Though guided analysis is useful for application development and demonstration purposes.
What is OpenResty XRay
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.