In this tutorial, you will learn how to diagnose root causes for HTTP 504 gateway timeout errors in online OpenResty or Nginx servers. By capturing only the packets in the TCP connection leading to the 504 errors on the application level, OpenResty XRay minimizes the performance impact. This feature makes it ideal for production environments that are sensitive to performance overhead and latency. This is our powerful Smart Packet Capture technology.

504.png

Problem: HTTP 504 gateway timeout errors

First let’s check the access log of the application.

Screenshot

As you can see, there are 504 errors.

Screenshot

Run the ps command to see the full command line for this application.

Screenshot

We can see it is the nginx binary executable provided by the official OpenResty repository.

Screenshot

Use the guidede analysis feature of OpenResty XRay to diagnose these errors

Let’s use OpenResty XRay to analyze these 504 response errors in real time and figure out the root cause.

Open the OpenResty XRay web console in the web browser.

Screenshot

Make sure it is the right machine you are watching.

Screenshot

You can choose the right machine from the list below if the current one is not correct.

Screenshot

Go to the “Guided Analysis” page.

Screenshot

Here you can see different types of problems that you can diagnose.

Screenshot

Let’s select “Errors & exceptions”.

Screenshot

Click on “Next”.

Screenshot

Select the OpenResty application you saw earlier.

Screenshot

We do not specify any process here.

Screenshot

Select “Whole application”.

Screenshot

Make sure that the application type is right. Usually the default should be correct.

Screenshot

OpenResty XRay can analyze multiple language levels at the same time. We’ll keep both Lua and C/C++ selected.

Screenshot

We can also set the maximum analyzing time. We’ll leave it as 300 seconds, which is the default value.

Screenshot

Let’s start analyzing.

Screenshot

The system will keep performing different rounds of analysis. Now it’s executing the first round.

Screenshot

The first round is done and it’s on to the second one already. That’s enough for this case.

Screenshot

Let’s stop analyzing.

Screenshot

It shows that the system is generating a report.

Screenshot

We can see it automatically created an analysis report.

Screenshot

This is the type of problem we diagnose. It’s “Errors & Exceptions”.

Screenshot

The first issue is about the HTTP response status code 504.

Screenshot

The entire HTTP request took over 3 seconds.

Screenshot

The network packet with the largest delay relative to the previous one is this TCP packet “push with ACK”.

Screenshot

The previous network packet is the ACK packet.

Screenshot

The delay between the two packets is more than 3 seconds.

Screenshot

The network packet with the largest delay is sent by the upstream server to the current server. The endpoint address of the upstream server is shown here.

Screenshot

You can also find the endpoint address of the current server.

Screenshot

By capturing only the packets in the TCP connection leading to the 504 errors on the application level, OpenResty XRay minimizes the performance impact. This feature makes it ideal for production environments that are sensitive to performance overhead and latency. This is our powerful Smart Packet Capture technology.

Click “More” to see more details.

Screenshot

Here is the HTTP request URI.

Screenshot

The horizontal axis is the serial number for each packet, starting from 1 and increasing monotonically.

Screenshot

The vertical axis shows delays of the packets relative to their previous ones. The larger the value, the larger the delay.

Screenshot

The small squares represent the network packets sent out by the current server.

Screenshot

The small circles here represent network packets received by the current server.

Screenshot

Hover the mouse over this slowest packet. You can see the latency data and that the network packet was received by the current server.

Screenshot

Based on the analysis above, the system concluded that the current server triggered a timeout error while waiting for the network packet from the upstream server.

Screenshot

It gives the root cause for this error: a slow upstream server,

Screenshot

or a slow network link between the current server and the upstream server.

Screenshot

Or the current server has a too-short timeout setting for that upstream server.

Screenshot

It also gives detailed suggestions.

Screenshot

Here is another type of HTTP 504 error. The current server actively closed the current connection after the upstream server did not respond for a long time.

Screenshot

Here you can clearly see that the TCP packet with the FIN and ACK flags has the largest delay with respect to the previous packet. That means the current server actively closed the connection.

Screenshot

So the conclusion is, that the current server activated the timeout protection while waiting for the network packet from the upstream server, and thus gave up waiting and closed the current connection.

Screenshot

The system also gives the root cause for this error: a slow upstream server,

Screenshot

or a slow network link between the upstream and your current server.

Screenshot

Or maybe the timeout setting in the current server for that upstream server is too short.

Screenshot

Automatic analysis and reports

OpenResty XRay can also monitor online processes automatically and show analysis reports. Go to the “Insights” page.

Screenshot

You can find the automatic reports on the “Insights” page for daily and weekly periods.

Screenshot

For this reason, you don’t have to use the “Guided Analysis” feature.

Screenshot

Guided analysis is useful for application development and demonstration purposes.

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.