Dynamic tracing is an exceptional tool for delving into the intricacies of complex online software systems. It allows us to pinpoint the root causes of various issues within these systems without necessitating any modifications to the software itself. Our advanced toolkit in OpenResty XRay includes the C-compatible Y language (Ylang), which enables the use of standard C code to trace target C programs efficiently. The Ylang suite of tools, once compiled with our ylang optimizing compiler, can generate dynamic-tracing analyzers compatible with eBPF+1, Stap+2, or GDB. However, a gap exists when it comes to C++ applications, which are prevalent in the real world, including widely used open-source software like the HotSpot JVM and NodeJS.

To bridge this gap, we have developed a custom C++ compiler, dubbed Y++, designed to emit Ylang code or a superset of GNU C. Remarkably, Y++ already supports nearly the entire C++ language specification up to C++23. This article will demonstrate some of Y++’s capabilities using straightforward examples.

Setting Up the Target C++ Program

We will employ a concise yet complete C++ program as our tracing target. This program utilizes the STL container std::vector in its global variable int_vec.

/* target.cxx: A sample C++ program for tracing */

#include <vector>
#include <cstdio>

std::vector<int> int_vec;

int main(void) {
    int sum = 0;
    std::vector<int>::iterator it;

    for (int i = 1; i <= 3; i++) {
        int_vec.push_back(i * 2);
    }
    for (it = int_vec.begin(); it != int_vec.end(); ++it) {
        sum += *it;
    }
    return sum - 12;
}

This program can be compiled using a standard C++ compiler command:

g++ -g -O2 target.cxx -o target

Options such as -fPIC -pie could also be included if necessary.

Executing the target program confirms its intended functionality:

$ ./target; echo $?
0

In the section below, we will proceed to craft an external C++ code snippet to trace this process dynamically.

Crafting the C++ (or Y++) Analyzer

We will develop a dynamic tracing analyzer utilizing the eBPF+ or Stap+ toolchains. This analyzer will monitor the global C++ variable int_vec and output all its elements in real-time. Below is the analyzer.yxx tool for this purpose. (The .yxx file extension signifies a Y++ source file.)

/* analyzer.yxx: A dynamic tracing tool for C++ */

#include <vector>
#include <cstdio>

_target std::vector<int> int_vec;

static void probe_handler(void) {
    for (auto it = int_vec.begin(); it != int_vec.end(); ++it) {
        printf("%d\n", *it);
    }
}

_probe main() -> int {
    probe_handler();
}

This code snippet leverages some extensions to the standard C++ language. The _target keyword specifies a symbol in the target program, here declaring the int_vec variable as found in the target application. The _probe keyword declares a dynamic probe point, akin to constructs in our Y language, defining a probe point at the main function’s return in the target program. Note the inclusion of standard headers like vector and cstdio in our tracer code, enabling the tracer to output the values of int_vec from the target program.

Compiling the analyzer.yxx file into Ylang code is achieved with our y++ compiler:

y++ -o analyzer.y analyzer.yxx

Subsequently, we can use the Ylang toolchain to compile this analyzer into executable tools for various platforms, such as eBPF+, Stap+, or GDB. Here’s an example using eBPF+:

ylang --gen-bpf --symtab debuginfo.jl analyzer.y

The debuginfo.jl file is automatically generated from the DWARF debug symbols within the target binary, which requires the -g compile option. We shall see how we can relax this requirement later in this article.

The ylang command produces two files for the eBPF+ toolchain: analyzer.bpf.c for kernel space and analyzer.ubpf.c for userland.

These generated C source files can be compiled into the eBPF bytecode and an executable analyzer for loading the eBPF bytecode, respectively, using the clang and gcc toolchains (we also enhanced the clang compiler’s eBPF backend).

Operationalizing the Target and Analyzer

The analyzer executable, generated as described, can initiate the target program and load the eBPF program for analysis, exemplified as follows:

$ ./analyzer ~/work/target
Start tracing...
2
4
6

This output confirms the analyzer’s ability to dynamically trace and output the values of the global variable int_vec from the target program. For ongoing target processes, the analyzer supports PID specification for targeted tracing:

./analyzer -p 12345 -t 3

This command traces the specified PID process for a maximum of 3 seconds.

While the process described may appear complex, our OpenResty XRay product can automate the entire workflow.

Advancing Support for Complex C++ Applications

We already support many C++ features like class inheritance, virtual method tables, dynamic memory allocations using the new and delete operators.

Efforts are underway to extend support for more intricate tracer C++ programs like the C++ code snippets in the HotSpot JVM, NodeJS, or MySQL. This will enable the creation of noninvasive dynamic tracing analyzers for JVM, Java programs, and a broader array of C++ applications. We are progressing rapidly in this area and encourage you to stay tuned for updates.

About the Debug Symbols

Debug symbols are essential for tracing (or debugging) binary executable programs, necessitating the use of the -g option during compilation with the g++ command. This requirement differs from debug builds, which introduce debugging code that may slow down the application. Importantly, debug symbols do not incur runtime overhead and are not loaded into memory during process execution.

In real-world scenarios, debug symbols may be missing or the target programs may not have been compiled with the -g option. Fortunately, OpenResty XRay' s proprietary AI and machine learning algorithms can regenerate debug symbols from stripped ELF binaries for popular open-source software like Nginx, LuaJIT, and Python, among many others, through our internal symgen tool. We also offer custom model training for client-specific programs, effectively releasing the power of dynamic tracing to almost every process out there.

Feel free to watch our YouTube video to see the magic in action.

Conclusion

The development and deployment of Y++, our custom C++ compiler, signifies a pivotal advancement in dynamic tracing. By extending dynamic tracing' s capabilities to C++ applications, we not only bridge a significant gap but also pave the way for more sophisticated analysis and observability techniques. The examples provided in this article merely scratch the surface of what is possible with Y++ and our dynamic tracing toolchain, including eBPF+ and Stap+.

As we refine our tools and expand their capabilities, we anticipate a future where developers and system administrators can gain unprecedented insights into their applications, leading to more reliable and performant software. Whether you are troubleshooting a complex issue in a large-scale distributed system or seeking to optimize your software’s performance, the tools and methodologies discussed herein offer a robust framework for achieving your objectives without the need for invasive code modifications.

Furthermore, our ongoing endeavors to support more intricate C++ applications, including those within the HotSpot JVM, underscore our dedication to innovation and the wider software development community. By leveraging the power of dynamic tracing, in conjunction with our proprietary technologies like OpenResty XRay, we are at the forefront of crafting solutions that tackle real-world challenges faced by developers today.

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology” . He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.


  1. eBPF+ is our enhanced version of eBPF that operates without requiring patches to existing Linux kernels. ↩︎

  2. Stap+ is our advanced version of SystemTap, offering extended functionalities. ↩︎