In high-concurrency gateway scenarios, directly writing business events to a message system at the request entry stage often implies that this path must meet extremely stringent performance requirements: overhead must be sufficiently low, and latency fluctuations must be well-controlled. Any unexpected execution cost will be magnified, impacting the overall request latency.

In practical engineering, many teams find that existing general-purpose writing solutions are not always ideal for this specific role: they prioritize overall throughput performance rather than predictable latency for individual requests. This difference is not apparent in background batch processing scenarios, but once it enters the gateway’s critical path, it can become a bottleneck for performance optimization. lua-resty-kafka-fast is precisely designed for this type of scenario. It decouples Kafka writes from the critical path of request processing, enabling the gateway to complete rapid event submission while maintaining its primary processing pace. For the caller, the latency of the write operation becomes more stable and easier to incorporate into the overall latency budget. With this uncertainty mitigated, data can be more securely captured at the system entry point for real-time decision-making, behavior analysis, or distributed tracing, without worrying about imposing additional burden on the main process.

Three Common Anti-Patterns We See in Production

In production environments, engineering teams often adopt “seemingly viable” solutions to expedite data ingestion into Kafka. However, as these solutions scale and traffic increases, they almost invariably expose significant performance and stability issues. The following three practices are common anti-patterns we’ve repeatedly observed across various production environments, ultimately requiring replacement.

1. The Kafka Proxy Service

This is a common and easily implemented approach: deploying a standalone microservice adjacent to the gateway, dedicated to interacting with Kafka. The gateway then forwards data to this service via HTTP or RPC.

  • Additional Hops: Introduces an extra network round trip, inevitably increasing latency.
  • New Failure Points: Requires handling additional inter-service calls, retries, timeouts, and data consistency challenges.
  • Operational Overhead: Necessitates deploying, monitoring, and maintaining an additional component.

2. Using Timers for Performance Optimization

In OpenResty environments, some developers attempt to use ngx.timer.at to offload Kafka write operations to the background, aiming to reduce their impact on the main request processing path. However, this approach often introduces new challenges in practice:

  • Poor maintenance, making the timely integration of community fixes and patches difficult, which inevitably leads to users assuming stability risks in production.
  • Incomplete protocol implementations are prone to unpredictable behavior under high-throughput or unstable network conditions.
  • The asynchronous sending mechanism implemented by timers lacks proper flow control between the client and server, potentially causing nginx to run out of memory.

3. Custom Kafka Protocol Implementation

This path is chosen by only a few teams and represents the highest risk. To minimize intermediate layers, some teams attempt to implement the Kafka protocol directly on the gateway side, retaining only minimal viable functionality. The challenges with this approach typically manifest as:

  • Protocol Complexity: The Kafka protocol is significantly more complex than it appears, encompassing broker discovery, partition rebalancing, error handling, and more. A complete implementation requires substantial effort.
  • Version Drift: Upgrades to the Kafka cluster can render your custom implementation incompatible.
  • Troubleshooting Difficulty: Such bespoke clients are exceptionally challenging to debug and maintain, posing a significant operational risk in production environments. When issues arise, it becomes exceedingly difficult to quickly pinpoint the root cause, whether it lies within the protocol implementation, cluster status, or the data itself.

Why This Problem Is Fundamentally Hard

From a comprehensive perspective, these solutions are not inherently “unworkable,” but rather introduce excessive overhead on the critical path:

  • Data requires more intermediate steps
  • Write completion time becomes unpredictable
  • Performance optimization potential is inherently constrained by the system architecture itself
  • Operational and troubleshooting costs scale linearly with growth

This transcends the scope of any specific language or component; it is a holistic efficiency challenge.

What is lua-resty-kafka-fast

The goal of lua-resty-kafka-fast is straightforward: to perform Kafka writes in OpenResty with minimal overhead, while maintaining the simplicity and predictable performance of Lua code.

At the usage level, developers interact with an intuitive, straightforward Lua API; at the execution level, all time-consuming operations are properly isolated, ensuring they do not interfere with OpenResty’s normal scheduling process. These two aspects are decoupled through clear engineering boundaries, allowing for both code readability and robust system performance to be achieved simultaneously. We formalize this approach as an engineering contract: API semantics remain simple, and performance characteristics remain stable.

lua-resty-kafka-fast is the complete implementation built around this contract.

  • Independent Execution Environment: Kafka-related processing logic runs in dedicated execution units, directly driven by the librdkafka C library. Communication between Lua and this environment occurs via efficient internal queues, keeping the request path lightweight and controllable.
  • Predictable Resource Usage Model: Configurations like lua_kafka_max_clients precisely control the number of background worker units and client instances, thereby preventing resource abuse.
  • Zero-Copy Data Path: When messages are passed between Lua and C, existing memory structures are reused as much as possible, reducing unnecessary data copying and lowering the fixed cost of each call.
  • Lua GC-Integrated Memory Management: Through clever design, the lifecycle of memory allocated by the C library is bound to Lua objects, allowing Lua’s garbage collector to automatically reclaim it and inherently prevent memory leaks.
  • Throughput-Optimized Batch Interface: Supports sending multiple messages in a single API call, significantly reducing the call overhead between Lua and C, and boosting overall throughput.

Performance Benchmarking

In actual performance tests, lua-resty-kafka-fast demonstrated significant advantages in write throughput.

  • In batch write scenarios, its single-CPU write performance can consistently achieve 250,000 QPS, which meets the throughput demands of high-frequency data collection and real-time decision-making.

By introducing an efficient batch writing mechanism at the edge, lua-resty-kafka-fast can significantly reduce the write cost per message. While maintaining a simple interface, it fully leverages Kafka’s processing potential in high-throughput scenarios. This makes reliable delivery of high-frequency data at the point of request ingress a feasible option, no longer requiring additional asynchronous links or intermediate buffering systems.

What This Changes at the Architecture Level

Introducing lua-resty-kafka-fast isn’t merely “replacing a Kafka client”; it fundamentally changes Kafka’s role and interaction within the system.

In common practice, API gateways primarily handle request orchestration and forwarding. Interactions with Kafka are often decoupled from the main request path, handled instead by independent services or out-of-band processing. While this approach offers a clear separation of concerns, it inevitably introduces additional communication overhead, context propagation costs, and operational burden.

When Kafka write capabilities can be integrated directly into the gateway in a more lightweight manner, certain processing steps that previously required external handling can now be reintegrated:

  • Events and request context are inherently linked: Message generation can occur directly at the request entry point, eliminating the need to copy or rebuild context information across multiple components.
  • Processing chain is more streamlined: Intermediate services previously responsible for writing to Kafka can be omitted, thereby reducing the overall system layers.
  • Request path is shorter and more stable: With fewer invocation steps, the overall response time becomes more consistent, and the blast radius of errors is easier to control.
  • System boundaries are easier to maintain: The number of components to deploy and manage is reduced, and the integration between the gateway and the messaging system becomes more direct.

This doesn’t imply that all Kafka-related logic should be centralized at the gateway layer. Instead, it means that when an event is inherently tied to a specific request, the gateway can now efficiently handle this type of work without introducing additional architectural complexity.

Use Cases

lua-resty-kafka-fast is designed for system scenarios with specific requirements regarding performance characteristics and resource utilization, and is particularly well-suited for the following environments:

  • Large-scale API Gateways The gateway layer itself handles high-volume concurrent requests and needs to quickly and stably transform certain requests into event streams during request processing. In such scenarios, any additional overhead in the request path can be rapidly magnified. Therefore, rather than prioritizing the “generality” of client interfaces, write efficiency and controlling the impact on the main processing flow are more critical.
  • Edge Computing and Data Ingress Pipelines Before data enters the core system, it needs to undergo filtering, trimming, or routing decisions at the ingress point. These nodes often already handle real-time traffic processing responsibilities and are better suited for directly delivering events, rather than transferring data to separate intermediary services.
  • High-throughput Event Ingestion Points For example, scenarios involving logs, metrics, and user behavior data require high throughput and stability, but organizations typically do not wish to introduce an entirely separate set of consumer services solely for Kafka.

In other words, lua-resty-kafka-fast is not about “how to use Kafka in a Lua environment,” but rather how to perform event writing within the request processing pipeline with the lowest possible overhead, while maintaining the predictability of overall system behavior.

Kafka Belongs Where Decisions Are Made

lua-resty-kafka-fast is a proprietary library built and maintained by the OpenResty Inc. team. Its objective is not to cover all Kafka usage patterns, but rather to serve systems highly sensitive to throughput, latency, and resource boundaries, while ensuring engineering maintainability in long-term operations.

In most systems, Kafka is often placed deep within the business process, functioning as a pure backend infrastructure. This placement is not an optimal architectural choice, but is more often constrained by runtime models and practical engineering implementation challenges.

The fundamental change introduced by lua-resty-kafka-fast is: it significantly reduces the engineering cost of shifting Kafka write operations earlier into the request entry point. When this paradigm shifts, events no longer need to wait until the end of the business process to enter the system. Instead, they can participate in decision-making, traffic routing, or analysis immediately upon request arrival. Given clear constraints and well-defined usage boundaries, Kafka is no longer merely an “end-of-pipeline logging system,” but can become an efficient, controllable event egress point within the request path. This is precisely the core problem lua-resty-kafka-fast aims to solve.

If you are evaluating integrating Kafka closer to the request entry point, or even directly deploying it within your API Gateway or edge computing layer, the following documentation can help you further determine whether lua-resty-kafka-fast is suitable for your system’s constraints and operational environment:

  • Installation and Prerequisites Outlines the installation process, dependency requirements, and prerequisites for integration with the OpenResty / Nginx runtime for lua-resty-kafka-fast.

  • Usage Guide and API Reference Provides detailed instructions on message production and consumption, configuration, error handling, and typical use cases.

If your current system is grappling with any of these challenges:

  • Significant performance or architectural friction between your API gateway and Kafka
  • The necessity of introducing additional intermediate services to decouple Kafka
  • Your team’s previous attempts at self-implementation resulted in setbacks regarding stability, memory consumption, or operational overhead

For a robust enterprise-grade solution, please click “Contact Us” in the bottom right corner. Our engineering team is ready to provide professional architectural advice and deployment support. You can also explore other private library products offered by OpenResty Inc., all designed with a focus on high-performance runtime, efficient resource management, and production-grade stability.

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.