Playbook

The webhook handler that handles retries.

Stripe, GitHub, and Linear webhooks processed reliably even when your downstream system is flaky. Idempotent, ordered, and replayable.

9 min read

The pain

Why webhook handlers go wrong

Stripe sends a webhook for every charge. Your handler receives it, fires off a Slack notification, updates HubSpot, and writes to the database. The Slack call times out. The handler returns 500. Stripe retries the webhook three times. Now you have four Slack messages and four HubSpot updates for one charge.

The four ways naive webhook handlers break:

  • Synchronous side effects. One slow downstream call holds the whole webhook hostage. Stripe times out at 30 seconds and retries.
  • No idempotency. Retried webhooks duplicate every side effect. Charges get notified four times.
  • Out-of-order delivery. GitHub fires pull_request.closed before pull_request.merged. Your state machine breaks because it expected ordered events.
  • Retry storms. Your downstream system goes down for 10 minutes. Stripe retries 50 webhooks in that window. Your downstream comes back up and immediately gets hit by 50 simultaneous requests. It goes down again.

The architecture

What we're building

A two-tier handler. Tier 1 is the HTTP endpoint Stripe calls — it validates the signature, persists the event to the database, and returns 200 in under 100ms. Tier 2 is a Rotor workflow that processes events from the database asynchronously, with idempotent side effects and rate-limited downstream calls.

  • Persist first, process async. The webhook endpoint never makes downstream calls. It writes the event to Postgres and returns 200 immediately.
  • Idempotent on event ID. The processor uses the Stripe event ID as the idempotency key. Replays are safe.
  • Per-resource ordering. Events for the same customer are processed in order using a concurrency key.

The webhook endpoint

Tier 1: receive and persist

The HTTP handler does three things and returns. No side effects.

// app/api/webhooks/stripe/route.ts
import { rotor } from "@/lib/rotor";

export async function POST(req: Request) {
  const sig = req.headers.get("stripe-signature");
  const body = await req.text();

  // 1. Verify signature
  let event;
  try {
    event = stripe.webhooks.constructEvent(body, sig, WEBHOOK_SECRET);
  } catch {
    return Response.json({ error: "invalid signature" }, { status: 400 });
  }

  // 2. Persist to DB (idempotent — unique on event.id)
  await db.stripeEvents.upsert({
    where: { stripe_event_id: event.id },
    create: {
      stripe_event_id: event.id,
      type: event.type,
      payload: event,
      received_at: new Date(),
    },
    update: {},  // already received, no-op
  });

  // 3. Trigger Rotor workflow (idempotent on event.id)
  await rotor.workflows.trigger("process-stripe-event", {
    data: { eventId: event.id },
    idempotencyKey: event.id,
  });

  return Response.json({ received: true });
}

The endpoint returns in under 100ms. Stripe is happy. The DB upsert and the Rotor trigger are both idempotent on event.id, so retries are safe.

The processor

Tier 2: process with idempotency and ordering

workflow({
  id: "process-stripe-event",

  // Per-customer ordering — events for the same customer process serially
  concurrency: {
    key: "event.data.customerId",
    limit: 1,
  },

  steps: async ({ event, step }) => {
    const stripeEvent = await step.run("load-event", () =>
      db.stripeEvents.findUnique({ where: { stripe_event_id: event.data.eventId } })
    );

    if (!stripeEvent) {
      throw new NonRetriable(`Event ${event.data.eventId} not found`);
    }

    // Idempotent side effects, each as its own step
    switch (stripeEvent.type) {
      case "charge.succeeded":
        await step.run("notify-slack", () =>
          slack.send("#charges", `New charge: ${stripeEvent.payload.data.object.amount}`)
        );
        await step.run("update-hubspot", () =>
          hubspot.deal.update(stripeEvent.payload.data.object.customer, {
            last_payment_at: stripeEvent.payload.created,
          })
        );
        await step.run("mark-processed", () =>
          db.stripeEvents.update({
            where: { stripe_event_id: event.data.eventId },
            data: { processed_at: new Date() },
          })
        );
        break;

      case "customer.subscription.deleted":
        await step.invoke("churn-flow", {
          workflow: handleChurn,
          data: { customerId: stripeEvent.payload.data.object.customer },
        });
        break;
    }
  },
});

Three things this gets right that a synchronous handler can't:

  • Each side effect is a step. If Slack times out, only the Slack step retries. HubSpot and the DB write already succeeded and don't repeat.
  • Ordering per customer. Two events for the same customer process one after the other, even if they arrived simultaneously.
  • Idempotency on the workflow. If Stripe retries the webhook (rare but happens), the second trigger collapses into the first run.

Edge cases

What goes wrong, and how to handle it

Out-of-order events. If you receive subscription.deleted before subscription.created (rare but possible across regions), the per-customer concurrency key prevents interleaving. Process the older event first by checking stripe_event.created at the top of the workflow.

Downstream outage. Slack is down for 10 minutes. The retries with exponential backoff handle it. The workflow is paused between retries with zero compute cost. When Slack comes back, the step succeeds and the workflow continues.

Replay an old event. A bug shipped that ignored a customer state change. To replay all events for a customer, query the stripeEvents table and re-trigger the workflow with the same idempotency keys. Side effects are idempotent — replays are safe.

Webhook secret rotation. Verify against the new secret first, fall back to the old one for a 24-hour window. Rotation is silent and zero-downtime.

The math

What this costs on Rotor

A typical SaaS receives ~5,000 Stripe events a month. Each event = 3-4 step-runs (load, side effect 1, side effect 2, mark processed). ~17,500 step-runs/month.

Add GitHub webhooks (~3,000/mo) and Linear webhooks (~1,500/mo) using the same pattern. Total: ~30,000 step-runs/month.

That fits Rotor Starter ($29/mo). The same workflow on alternatives:

  • Vercel cron + Sentry for observability: free in compute, but no native idempotency, no concurrency keys, and no replay. You write all of it from scratch and maintain it forever.
  • Zapier: 30k tasks at Professional pricing. The per-customer ordering and idempotency are not native primitives — you'd build them out of band.

Fork this playbook on Rotor.

$9 to start. 30-day money back. Hard caps protect you from runaway bills.

Start shipping