OpenResty XRay is a post-modern precision performance profiling tool designed for high-performance systems. It helps software engineers capture complete performance profiles of systems with very low overhead, without affecting production environment operations, thereby more quickly identifying performance bottlenecks at the function level. In this rapidly growing internet era, facing dual challenges of scale and complexity, XRay employs innovative sampling techniques to track program execution paths with millisecond-level precision, allowing engineers to regain insight and control over their systems. Unlike traditional profiling tools, it provides developers with clear performance optimization directions, enabling us to analyze performance issues without taking machines offline, modifying code, or restarting services.

With its powerful call stack analysis capabilities, OpenResty XRay can reveal hidden performance issues in complex systems, helping developers understand the true sources of resource consumption. Whether for CPU-intensive applications or I/O-intensive services, XRay provides deep and accurate performance insights.

What is LLVM/clang

LLVM (Low Level Virtual Machine) is a collection of modular, reusable compiler and toolchain technologies, while Clang is a compiler frontend for C, C++, and Objective-C programming languages based on the LLVM architecture. As a cornerstone of modern compilation technology, LLVM/clang is widely used in various development environments and operating systems, and its performance directly affects developers’ daily work efficiency and final product quality.

However, in certain specific working modes, LLVM/clang has issues with excessively large output file sizes and inefficient processing, which not only increases storage burden but also extends compilation cycles, affecting the development experience.

OpenResty XRay Analysis Process

Recently, our team conducted an in-depth analysis of LLVM/clang using OpenResty XRay and successfully submitted a critical performance optimization patch. This patch achieved remarkable results in specific scenarios: reducing output file size by 5-6 times while decreasing overall runtime by nearly 25%. Such significant performance improvements vividly demonstrate the tremendous value that professional performance analysis tools can provide when addressing challenges in today’s complex systems.

Through the detailed analysis reports generated by OpenResty XRay, we clearly identified performance bottlenecks in LLVM/clang under specific modes. The analysis results showed that there were numerous redundant operations during the output file generation process, which not only consumed substantial CPU resources but also led to bloated output files. With XRay’s call graph and hotspot function analysis capabilities, we precisely located the problematic code paths in a very short time.

Optimization Implementation and Results

Based on the precise analysis data provided by OpenResty XRay, we designed and implemented a targeted patch. This patch primarily optimizes LLVM/clang’s code generation strategy in specific modes by eliminating unnecessary intermediate representations and optimizing redundant data structures, significantly improving system efficiency.

During the implementation of this patch, we paid special attention to maintaining the integrity and correctness of LLVM/clang’s original functionality, ensuring that the optimization would not introduce new issues. This was like performing surgery on a precisely operating machine, where the slightest mistake could introduce new problems. Through rigorous testing and validation, we ensured that the patch could significantly improve performance while maintaining output quality.

When we applied this patch to the actual environment, LLVM/clang’s performance in specific modes was significantly improved:

  • Output file size reduced by 5-6 times, greatly saving storage space
  • Total runtime decreased by nearly 25%, improving compilation efficiency
  • Peak memory usage also showed a notable decrease, reducing system resource pressure

These improvements not only optimized developers’ daily workflows but also brought performance benefits to various applications that depend on LLVM/clang.

Conclusion

Through this LLVM/clang optimization practice guided by OpenResty XRay analyzer, we not only achieved significant performance improvements but also demonstrated the crucial role of professional performance analysis tools in optimizing complex systems. OpenResty XRay, with its precise and low-overhead characteristics, helps developers identify performance issues that are difficult to detect using traditional methods, providing reliable evidence for system optimization.

In addition to XRay, OpenResty Inc. also offers comprehensive private library services covering technical needs across various industries. These private libraries have significant advantages in performance optimization, security protection, and data processing, helping enterprises quickly build high-performance, highly reliable application systems. Whether in finance, e-commerce, or media industries, OpenResty’s private libraries can provide tailored solutions to meet specific requirements in different scenarios.

The OpenResty Inc. team will continue to develop and improve the XRay toolkit and private library services, helping developers and enterprises discover and solve various performance bottlenecks. We believe that through precise performance analysis and targeted optimization, many seemingly unimprovable performance issues can achieve breakthrough solutions.

If your team is also facing similar performance challenges, consider trying OpenResty XRay and private library services, which may bring unexpected efficiency improvements and technical breakthroughs.

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.