Circuit Breaker Pattern in Laravel: Protecting Your Queue from Failing APIs

LaravelDevOpsqueueresiliencecircuit breaker
Kirill Latish
Kirill Latish
LinkedIn
Share

The Circuit Breaker Pattern in Laravel helps manage API failures by preventing overwhelming retries when an external service is down. Learn how to implement this pattern to protect your queue.

It's Friday Night and Your Queue Is on Fire

Friday, 11:47 PM. Stripe's API starts returning 500s. Your Laravel application has 50,000 queued jobs — payment confirmations, subscription renewals, invoice generation — and every single one begins retrying against a dead endpoint. Within minutes, you've burned through your rate limits, your queue workers are saturated with doomed retries, and legitimate jobs from healthy services (email notifications, report generation, webhook deliveries) are stuck behind a wall of guaranteed failures.

Sound familiar? Maybe not Stripe specifically, but every production Laravel app that talks to external APIs is one outage away from this exact scenario.

Most Laravel applications treat API failures as isolated events. A job fails, it retries with backoff, maybe it lands in the failed_jobs table after three attempts. That works when one request out of a thousand hits a timeout. It does not work when an entire external service goes down and every job targeting that service fails simultaneously.

The circuit breaker pattern solves this. Borrowed from electrical engineering — where a breaker trips to prevent a short circuit from burning down the house — the software version detects systematic failures and stops making calls to a broken service before the damage spreads. Instead of 50,000 jobs each discovering independently that Stripe is down, the first handful of failures trip the breaker, and the rest skip execution immediately.

Let's build one in Laravel, from scratch and with existing packages, wired directly into the queue pipeline.

Why Individual Retries Aren't Enough

Laravel's built-in retry mechanism handles transient failures well. A brief network hiccup, a momentary database lock, a single request that times out. Set $tries = 3 and $backoff = [10, 60, 300], and the framework does the rest. But what happens when the failure isn't transient?

Say you have 10,000 pending Stripe charges in the queue. Stripe goes down. Every job attempts the API call, waits for the timeout (typically 30 seconds), fails, and re-enters the queue for retry. Three retries means 30,000 failed HTTP requests, each holding a queue worker hostage while it waits on a response that will never come. Run the math on 8 workers with 30-second timeouts: your throughput drops to roughly 16 jobs per minute. All failures.

The cascade hits fast. First, you exhaust your rate limits. All those retries hammer the failing API, and when Stripe eventually recovers, you're rate-limited for another hour because you blew through your quota beating on a dead endpoint. A 10-minute outage becomes a multi-hour problem.

Then your workers starve. They're a finite resource, and while they're all blocked waiting on Stripe timeouts, jobs for perfectly healthy services sit untouched. SendGrid emails? Queued. OpenAI completions? Queued. Internal webhooks? Also queued. Your users can't reset their passwords because Stripe is down. That shouldn't be a sentence that makes sense, but here we are.

Finally, your storage fills up. Failed jobs pile up in Redis or your jobs table. Retry metadata accumulates. If you're on the database queue driver, that table balloons and starts slowing down the queries that manage the queue itself. Fun.

And this isn't unique to payment processors. OpenAI's API returns 429s during peak hours, flooding your retry queue with AI jobs that won't succeed until the rate limit window resets. An email provider has a regional outage, and your transactional emails choke the pipeline. A partner's webhook endpoint goes dark for maintenance nobody told you about.

Mermaid diagram is empty

Every one of those retries thinks it's dealing with an isolated failure. Nobody is looking at the bigger picture. That's the gap the circuit breaker pattern fills.

The Circuit Breaker Pattern Explained

The name comes from electrical engineering. A physical circuit breaker trips when current exceeds safe levels, cutting the circuit before the wiring melts. The software version does the same thing: it wraps calls to an external service, tracks failures, and cuts the connection when things go sideways.

There are three states. In Closed mode, everything is normal. Requests flow through to the external service, the breaker quietly tracks success and failure counts, and nobody notices it's there. This is where you want to stay.

When failures cross a threshold, the breaker flips to Open. Now it short-circuits every request immediately. No HTTP call, no 30-second timeout, no wasted worker time. Jobs that hit an open breaker can fail fast, get released back to the queue for later, or trigger a fallback. The breaker is protecting you from yourself.

After a cooldown period (say, 60 seconds), the breaker enters Half-Open and lets a couple of probe requests through. Did the service recover? If those probes succeed, the breaker resets to Closed and traffic resumes. If they fail, it snaps back to Open and waits another cooldown cycle. Think of it as cautiously sticking your head out to check if the storm passed.

Mermaid diagram is empty

How do you decide when to trip? For a low-volume service, a simple consecutive failure count works fine: trip after 5 failures in a row. High-throughput services need something more nuanced, like a failure rate window: trip if more than 50% of requests failed in the last 60 seconds. You can also combine both (more than 10 failures AND a failure rate above 40% in 2 minutes) to avoid false trips from a couple of unlucky timeouts.

Getting the cooldown right takes some tuning. Set it too short and the breaker flaps constantly, sending probe requests into a service that hasn't had time to recover. Set it too long and your app stays degraded for minutes after the service comes back online. Start with 30-60 seconds for most external APIs. If the breaker trips repeatedly, apply exponential backoff: 30 seconds the first time, 60 the second, 120 the third.

One thing to watch: keep the half-open probe traffic small. One or two test requests is plenty to verify recovery. Letting all your pent-up traffic through the moment the cooldown expires can overwhelm a service that just came back, pushing it right back into failure.

Implementing Circuit Breakers in Laravel

Option 1: A Redis-Backed Circuit Breaker From Scratch

Building your own gives full control and zero external dependencies beyond Redis (which your Laravel queue likely uses already). Here's a production-ready implementation:

php
<?php

namespace App\Support;

use Illuminate\Support\Facades\Redis;

class CircuitBreaker
{
    public function __construct(
        private string $service,
        private int $failureThreshold = 5,
        private int $cooldownSeconds = 60,
        private int $halfOpenMaxAttempts = 2,
    ) {}

    public function isAvailable(): bool
    {
        $state = $this->getState();

        if ($state === 'closed') {
            return true;
        }

        if ($state === 'open') {
            // Check if cooldown has expired
            $openedAt = (int) Redis::get($this->key('opened_at'));
            $cooldown = $this->getCurrentCooldown();

            if (time() - $openedAt >= $cooldown) {
                $this->transitionTo('half-open');
                return true;
            }

            return false;
        }

        // Half-open: allow limited probe requests
        $probeCount = (int) Redis::get($this->key('probe_count'));
        return $probeCount < $this->halfOpenMaxAttempts;
    }

    public function reportSuccess(): void
    {
        $state = $this->getState();

        if ($state === 'half-open') {
            $this->reset();
        }

        // In closed state, reset consecutive failure count
        Redis::del($this->key('failures'));
    }

    public function reportFailure(): void
    {
        $state = $this->getState();

        if ($state === 'half-open') {
            $this->trip();
            return;
        }

        $failures = Redis::incr($this->key('failures'));
        Redis::expire($this->key('failures'), $this->cooldownSeconds * 2);

        if ($failures >= $this->failureThreshold) {
            $this->trip();
        }
    }

    private function trip(): void
    {
        $tripCount = (int) Redis::get($this->key('trip_count'));
        Redis::incr($this->key('trip_count'));
        Redis::expire($this->key('trip_count'), 3600);

        Redis::set($this->key('opened_at'), time());
        $this->transitionTo('open');
    }

    private function reset(): void
    {
        Redis::del(
            $this->key('failures'),
            $this->key('opened_at'),
            $this->key('probe_count'),
            $this->key('trip_count'),
        );
        $this->transitionTo('closed');
    }

    private function getCurrentCooldown(): int
    {
        $tripCount = (int) Redis::get($this->key('trip_count'));
        // Exponential backoff: 60s, 120s, 240s, max 600s
        return min($this->cooldownSeconds * (2 ** $tripCount), 600);
    }

    private function getState(): string
    {
        return Redis::get($this->key('state')) ?? 'closed';
    }

    private function transitionTo(string $state): void
    {
        Redis::set($this->key('state'), $state);

        if ($state === 'half-open') {
            Redis::set($this->key('probe_count'), 0);
        }
    }

    private function key(string $suffix): string
    {
        return "circuit_breaker:{$this->service}:{$suffix}";
    }
}

Option 2: Using a Circuit Breaker Package

Several Composer packages bring circuit breaker functionality to PHP. The ejsmont-artur/php-circuit-breaker package has been a stable option in the PHP ecosystem for years, supporting APC and Memcached backends. For Laravel specifically, wrapping it into a service provider takes minimal effort. Newer packages targeting Laravel offer tighter integration with the framework's cache and event systems — search Packagist for circuit-breaker laravel to find the latest maintained options.

Either way, the integration point with Laravel queues is job middleware.

Integrating with Laravel Queue Middleware

Laravel's job middleware feature lets you wrap job execution with custom logic — exactly what a circuit breaker needs. Create a middleware that checks the breaker state before the job runs:

php
<?php

namespace App\Jobs\Middleware;

use App\Support\CircuitBreaker;
use Closure;

class CheckCircuitBreaker
{
    public function __construct(
        private string $service,
    ) {}

    public function handle(object $job, Closure $next): void
    {
        $breaker = new CircuitBreaker($this->service);

        if (! $breaker->isAvailable()) {
            // Release the job back to the queue with delay
            // instead of executing against a known-dead service
            $job->release(30);
            return;
        }

        try {
            $next($job);
            $breaker->reportSuccess();
        } catch (\Throwable $e) {
            $breaker->reportFailure();
            throw $e;
        }
    }
}

Apply the middleware to any job that depends on an external service:

php
<?php

namespace App\Jobs;

use App\Jobs\Middleware\CheckCircuitBreaker;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class ProcessStripePayment implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 5;
    public array $backoff = [10, 30, 60];

    public function __construct(
        private string $paymentIntentId,
    ) {}

    public function middleware(): array
    {
        return [
            new CheckCircuitBreaker('stripe'),
        ];
    }

    public function handle(): void
    {
        // Make the Stripe API call
        $payment = \Stripe\PaymentIntent::retrieve($this->paymentIntentId);
        // Process the payment...
    }
}

Each external service gets its own circuit. Stripe failing doesn't trip the OpenAI breaker. SendGrid going down doesn't affect your payment processing. The breakers are independent, identified by the service name string passed to the middleware.

Beyond the Breaker: Building a Resilient Queue Architecture

A circuit breaker alone won't save you. It's a critical piece, but think of it like a smoke detector: great at alerting you to the fire, not so great at fireproofing the house. You need the rest of the architecture to match.

Dedicated Queues per Service

Remember the worker starvation problem from earlier? The simplest fix is giving each external dependency its own queue with its own pool of workers. Stripe goes down? The payments workers are stuck, but your email workers keep humming along on a completely separate queue. Configure it in config/queue.php and your Supervisor setup:

php
// config/queue.php
'connections' => [
    'redis' => [
        'driver' => 'redis',
        'connection' => 'default',
        'queue' => env('REDIS_QUEUE', 'default'),
        'retry_after' => 90,
        'block_for' => null,
    ],
],
bash
# Supervisor configuration: separate workers per service queue
# payments worker — isolated from other service failures
php artisan queue:work redis --queue=payments --tries=3

# emails worker — keeps running even if payments queue is backed up
php artisan queue:work redis --queue=emails --tries=3

# ai-processing worker — separate timeout settings for long-running AI calls
php artisan queue:work redis --queue=ai-processing --timeout=120 --tries=2

# default worker — catches everything else
php artisan queue:work redis --queue=default --tries=3

Dispatch jobs to the appropriate queue explicitly:

php
ProcessStripePayment::dispatch($paymentIntentId)->onQueue('payments');
SendTransactionalEmail::dispatch($userId)->onQueue('emails');
GenerateAICompletion::dispatch($prompt)->onQueue('ai-processing');

Fallback Strategies

When the circuit is open, $job->release(30) is the lazy fallback: "try again in 30 seconds and hope for the best." It works, but you can do better. A payment job could persist the charge request to a pending table and process it when the breaker closes, so you don't lose the order. Email jobs can swap providers on the fly: if SendGrid is down, route through Postmark instead. AI features might serve a cached response or gracefully degrade to a non-AI code path. For anything user-facing, showing a "temporarily unavailable" message beats leaving people staring at a spinner.

Monitoring and Alerting

What good is a circuit breaker if nobody knows it tripped? Fire a Laravel event when a breaker changes state, and hook that into your monitoring stack:

php
// Inside the trip() method of CircuitBreaker:
event(new CircuitBreakerTripped($this->service, $failures));

// In your EventServiceProvider or a listener:
// Send to Slack, PagerDuty, Datadog — whatever your team monitors

Pair this with Laravel's built-in queue:monitor command to alert when queue sizes exceed thresholds. If a queue is growing while its breaker is open, that's actually fine. The service is down, jobs are accumulating, but they're not wasting workers or rate limits. That's the system doing its job. The alert tells you to investigate; the breaker keeps things from getting worse while you do.

One more thing: expose your breaker states through a health check endpoint. An /api/health route that reports which circuits are open gives your ops team instant visibility without digging through logs. Your load balancer can use it too. For teams building AI-powered products that depend on model APIs like OpenAI or Anthropic (where rate limits and outages are a regular reality), this kind of operational visibility is non-negotiable. The AI Product Manager course goes deep on building this kind of resilience into AI product architectures.

Failing Gracefully Is an Architecture Decision

Go back to the Friday-night Stripe scenario. With a circuit breaker in place, the first 5 failed payment jobs trip the breaker. The remaining 49,995 jobs hit the middleware, see the open circuit, release themselves back to the queue with a 30-second delay, and move on in milliseconds. Your workers stay free to process emails, webhooks, and everything else. When Stripe recovers, the breaker probes it, confirms it's healthy, and the payment jobs start flowing again. No rate limit exhaustion. No worker starvation. No multi-hour recovery tail.

None of this requires exotic infrastructure. Redis you probably already have, a single PHP class, a job middleware, and some Supervisor config for per-service queues. Everything in this article runs on Laravel 12.x.

But here's the thing: you have to build this before the outage. Nobody adds circuit breakers at 11:47 PM on a Friday while Slack is blowing up. These are decisions you make during architecture and review, not during incident response. If you're working on systems that need to stay up under real traffic with real external dependencies, the Highload Software Architecture course covers the full picture: circuit breakers, bulkheads, backpressure, and the other resilience patterns that separate production-grade systems from apps that just happen to work when everything is fine.

Kirill Latish
Kirill Latish
LinkedIn
Share