OpenResty XRay is a performance analysis and troubleshooting platform based on our improved dynamic tracing technologies. It can analyze and diagnose various technology stacks and software, including the Linux kernel, Go, OpenResty/Nginx, PHP, Python, Perl, etc. While developing and testing OpenResty XRay, we often find many bugs in open-source software and toolchains. After all, OpenResty XRay pushes dynamic-tracing technology to an unprecedented level. Our analysis tools for various technology stacks and software are much more complex than those open-source tools targeting SystemTap, DTrace, and eBPF. Therefore, our test suite often puts unprecedented pressure on the Linux kernel, eBPF, LLVM, SystemTap, GDB, etc. High pressure naturally captures deeper bugs.

In this article, we intend to introduce two bugs in the tracing subsystem of the Linux kernel. These two bugs will affect all tools and frameworks involving dynamic tracing in Linux. Fortunately, OpenResty XRay can bypass one of the bugs and protect our customers from falling into the pit of another bug.

Kernel deadlocks in user-space memory reading

The official mainline kernel of Linux, from version 5.19 to 6.2 series, has a relatively severe regression. A kernel developer for the mm subsystem changed the implementation of user-space memory reading, resulting in a deadlock when the tracing subsystem read user-space memory in the NMI context (such as perf events) until fixed in the 6.3 series. The open-source eBPF toolchain (including perf and bcc) is all affected. We submitted a patch to bypass this bug for the open-source SystemTap and worked around it in our OpenResty XRay platform’s eBPF+ runtime. This bug can be reproduced by running our OpenResty XRay test suites.

Therefore, we must be more active in testing the latest mainline kernel in the future; otherwise, so many kernel version series suffer from such a severe regression. Alas.

According to our experience, Ubuntu and Debian usually do not backport patches. In contrast, Red Hat backports these patches very diligently (of course, sometimes Red Hat also backports problematic patches, which introduces new bugs, as we shall see below.

Data races in x86 breakpoint insertion of the kernel

The official mainline kernel of Linux, from version 5.2 to 5.4 series, has a critical regression. A developer was refactoring the dynamic breakpoint insertion code on x86 (including x86_64) and had a brain cramp. He did not consider the situation of concurrent and parallel access at all. He programmed the multi-threaded scenario as single-threaded directly and read and write global variables without protection in the kernel context, which is equivalent to running naked.

One funny thing is that the original unaffected CentOS/RHEL 7’s 3.10 kernel was also affected because Red Hat diligently backported this problematic patch. Fortunately, after a while, Red Hat also diligently backported the fix for this regression. Ubuntu and Debian’s 5.2 ~ 5.4 kernels will not backport fixes as usual. But fortunately this bug is more likely to occur with many CPU cores. We blacklisted all kernels with this bug in our OpenResty XRay product, but users can bypass our protection with force if desired.

This bug will affect all x86 tools involving dynamic tracing, eBPF, SytemTap, Dtrace, etc. There is no (known) way to bypass it because it is too low-level and directly at the level of machine assembly instructions specific to the architectures.

Conclusion

In this article, we covered two tracing bugs in the Linux kernel that we discovered using OpenResty XRay. These bugs have far-reaching consequences and affect many tools and frameworks that rely on dynamic tracing technology. We have explained the causes and effects of these bugs and when they got fixed in the mainline kernel. We hope this article can help other developers and users who encounter similar problems and raise awareness of the importance of testing and updating the kernel regularly.

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open-source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.