logo

Bulkhead Pattern for Fault Isolation

The bulkhead pattern divides a system into isolated compartments, each with its own dedicated resources. The name comes from the watertight compartments in a ship’s hull - if one section floods, the sealed bulkheads prevent water from reaching other sections. Applied to software, this means a failure in one component cannot consume the resources that other components depend on.

How It Works

Without isolation, all components share a single resource pool. One misbehaving component can starve the rest:

Shared pool (10 threads):
Service A (healthy): needs 3 threads
Service B (failing): consumes all 10 threads waiting on timeouts
Service C (healthy): 0 threads available → also fails

With bulkheads, each component gets its own allocation:

Bulkhead A (4 threads): Service A uses 3, 1 idle
Bulkhead B (3 threads): Service B saturates its 3, blocks
Bulkhead C (3 threads): Service C uses 2, 1 idle → unaffected

Types of Bulkheads

Thread pool isolation assigns separate thread pools to different operations. A slow external call blocks only its own pool.

const bulkheads = {
payments: new Semaphore(5), // Max 5 concurrent payment calls
emails: new Semaphore(10), // Max 10 concurrent email sends
search: new Semaphore(8), // Max 8 concurrent search queries
};
async function withBulkhead(name, fn) {
const permit = await bulkheads[name].acquire();
try {
return await fn();
} finally {
permit.release();
}
}
// Payment service failure cannot starve email or search
await withBulkhead('payments', () => chargeCard(order));

Process isolation runs different workloads in separate processes or containers. A memory leak in one process cannot affect others.

Queue isolation assigns different task types to separate queues with dedicated workers. A spike in one task type cannot delay processing of another.

Sizing Bulkheads

Setting the right limits requires balancing two risks:

  • Too large: The bulkhead allows one component to consume too many resources before hitting its cap
  • Too small: Normal traffic gets throttled because the allocation is insufficient

Start by measuring each component’s typical and peak resource usage. Set the bulkhead limit above peak but below the level that would harm other components. Revisit these numbers as traffic patterns evolve.

When to Use the Bulkhead Pattern

  • Services calling multiple external APIs with varying reliability
  • Multi-tenant systems where one tenant’s activity should not affect others
  • Worker pools processing mixed task types with different resource profiles
  • Any architecture where a single slow dependency has caused system-wide outages before

Bulkheads and Circuit Breakers

Bulkheads and circuit breakers complement each other. Bulkheads contain the blast radius of a failure by limiting resource consumption. Circuit breakers detect repeated failures and stop sending requests entirely. Together, they form a layered defense: the bulkhead prevents resource starvation while the circuit breaker prevents wasted effort.