Queue Growth, Dead-Letter Queues, and Why Asynchronous Failures Are Easy to Misread

Yotam Yemini

Yotam Yemini

January 20, 2026

Queue Growth, Dead-Letter Queues, and Why Asynchronous Failures Are Easy to Misread

Asynchronous pipelines sit at the core of most modern systems. Message brokers accept traffic, consumers process it in the background, and downstream services depend on the results.

When these systems fail, the failure rarely shows up where it starts.

Teams often notice stale data, degraded behavior, or latency spikes elsewhere in the system. By the time those symptoms appear, the underlying problem has usually been present for some time.

In many real-world failures, two signals appear earlier: queue growth and dead-letter queues. They are widely monitored, but they are still widely misunderstood.

The common misunderstanding 

Queues are often treated as infrastructure components rather than behavioral signals. When a queue grows, it is attributed to load. When messages land in a DLQ, it is treated as a retry policy doing its job. Investigation tends to focus downstream, where symptoms are visible.

This framing obscures where asynchronous systems actually break. In many failures, message brokers continue to accept traffic normally. Producers succeed. Nothing looks obviously down. The problem is that consumers are no longer able to keep up reliably or consistently. That distinction matters.

Queue growth is not just volume 

Queue growth occurs when messages arrive faster than they can be processed successfully over time. This does not require a traffic spike. It can result from: 

  • Consumers slowing due to code changes or resource pressure 
  • Dependencies becoming latent or unreliable 
  • Retry rates increasing 
  • Backpressure failing to engage 
  • Partition skew concentrating work unevenly 

In these cases, the broker behaves correctly. Messages are accepted. The queue grows quietly. What is accumulating is lag. A sustained backlog means work is no longer flowing through the system at the intended rate, even if no explicit failures are visible yet.  

Why this matters before anything looks broken 

Asynchronous systems are designed to absorb instability. Queues buffer mismatches. Retries smooth over failures. Backlogs delay visible impact. This is useful, but it also postpones feedback. As queues grow: 

  • Processing time increases 
  • Derived state falls behind 
  • Downstream services operate on increasingly stale or incomplete data 

The transition from “degraded” to “broken” often appears sudden because the system has been accumulating lag for some time before any external threshold is crossed. 

Dead-letter queues signal a different failure 

Dead-letter queues exist to capture messages that cannot be processed successfully. Messages land there after repeated failures, timeouts, or deterministic errors. DLQs prevent infinite retries and protect the main pipeline. What they represent is not transient instability, but persistent processing failure under current system behavior

A non-empty DLQ means some class of messages cannot be handled as the system is currently operating. That incompatibility can come from: 

  • Broken contracts between producers and consumers 
  • Partial or skewed deployments 
  • Schema drift 
  • Unhandled edge cases 
  • Dependencies that fail consistently rather than intermittently 

DLQs often grow alongside backlogs, but they can also appear independently. 

Why these problems are so common 

In real systems, producers and consumers evolve independently. Load shifts. Dependencies degrade. Retry behavior changes system dynamics in non-obvious ways. It is common for: 

  • Brokers to continue accepting traffic 
  • Queues to grow steadily 
  • Consumers to fail intermittently or slow down 
  • Processing failures to accumulate quietly 

Operationally, queues and DLQs sit between services. They rarely have clear ownership. They are easy to monitor superficially and hard to reason about in context. As a result, many teams only notice these issues once downstream behavior degrades.  

Queue growth and DLQs are often discussed together, but they answer different questions. 

Queue growth asks: "Are messages flowing through the system fast enough?"

DLQs ask:  "Are some messages failing to be processed at all?"

In many incidents, sustained queue growth precedes DLQs. Consumers slow down, retries increase, and retry limits are eventually exceeded. In others, DLQs appear immediately due to deterministic processing failures, even while queue depth looks healthy. Treating one as a proxy for the other creates blind spots. 

The deeper diagnostic challenge 

Most teams diagnose asynchronous failures indirectly. They look at:

  • Latency spikes 
  • Error rates 
  • Timeouts 
  • User-visible symptoms 

Those signals matter, but they are downstream effects. Earlier and more precise signals exist inside the message pipeline itself: where messages are accepted, where they slow down, and where they fail to be processed reliably. When those signals are ignored or misinterpreted, teams spend time chasing symptoms rather than isolating where the workflow is actually breaking.  

A better way to think about queues 

Queues are not just buffers. Queue growth is not harmless backlog. Dead-letter queues are not operational exhaust. They are indicators of whether asynchronous workflows are functioning as intended or quietly degrading under real conditions. 

The goal is not to watch queue depth, instead, it’s to continuously understand flow: where work accumulates, why it accumulates, and which downstream interactions it impacts. 

Understanding them is not an optimization. It is foundational to operating reliable, event-driven systems. 

Shipped in v1.0.109: Causely now models the async failure mode behind queue growth and DLQs, so teams can pinpoint where processing breaks down instead of misreading the signals as generic load.

See the release notes See the release notes
Arrow

Know what to chase when everything breaks

See how Causely helps teams cut through cascading symptoms, identify the real source of issues, and act with confidence in production and before changes ship.