Playbook

The Slack notification queue that doesn't spam.

Debounce noisy alerts. Batch routine ones. Escalate the rest. Severity levels and de-duplication built in.

8 min read

The pain

Why your Slack alerts get muted

A flaky webhook handler triggers the same alert 50 times in a minute. The on-call engineer mutes the channel. Three weeks later, a real incident fires in the same channel and nobody sees it for four hours.

The four ways naive Slack notifications break:

  • No deduplication. Same alert fires 50x. Channel becomes unreadable. Engineers mute it.
  • No severity routing. "Pipeline build started" and "Production database is down" both go to the same channel with the same formatting.
  • No batching for routine events. Every customer signup pings #signups. 200 signups a day = 200 pings = nobody reads them.
  • No escalation for unread criticals. Critical alert fires at 3am. Channel posts. Engineer sleeps. No DM, no page, no ack-or-escalate flow.

The architecture

What we're building

One workflow per inbound alert, with three policies it can apply:

  • Dedup window. Identical alerts within 5 minutes collapse to one Slack message with a count.
  • Severity routing. info goes to a daily batch summary. warn goes to a channel. critical goes to channel plus DM plus a timer that escalates if no one acknowledges.
  • Format per severity. Info gets one line. Warn gets a structured block. Critical gets a banner with action buttons.
workflow({
  id: "slack-alert",
  trigger: { event: "alert.fired" },

  steps: async ({ event, step }) => {
    const alert = event.data;

    // Dedup: identical alerts collapse
    const dedupKey = `${alert.severity}-${alert.title}`;
    const isDuplicate = await step.run("check-dedup", () =>
      kv.exists(`alert-seen:${dedupKey}`, { ttl: "5m" })
    );

    if (isDuplicate) {
      await step.run("increment-count", () =>
        kv.incr(`alert-count:${dedupKey}`)
      );
      return { suppressed: true };
    }

    await step.run("mark-seen", () =>
      kv.set(`alert-seen:${dedupKey}`, "1", { ttl: "5m" })
    );

    // Route by severity
    switch (alert.severity) {
      case "info":
        return await step.run("batch-info", () =>
          db.alertBatch.insert({ alert, fires_at: nextDailyDigest() })
        );

      case "warn":
        return await step.run("send-warn", () =>
          slack.send(alert.channel, formatWarn(alert))
        );

      case "critical":
        return await handleCritical(step, alert);
    }
  }
});

Critical alerts

The escalation flow

Critical alerts get a four-step flow: post to channel, DM the on-call, wait for ack with a timeout, escalate if no ack.

async function handleCritical(step, alert) {
  // 1. Post to channel with action buttons
  const ackId = `ack-${alert.id}`;
  await step.run("post-critical", () =>
    slack.send(alert.channel, formatCritical(alert, ackId))
  );

  // 2. DM the on-call
  const oncall = await step.run("get-oncall", () =>
    pagerduty.currentOncall(alert.team)
  );
  await step.run("dm-oncall", () =>
    slack.dm(oncall.slack_id, `Critical: ${alert.title}. Ack in #alerts.`)
  );

  // 3. Wait for ack — durable, no compute cost while waiting
  const ack = await step.waitForEvent("wait-ack", {
    event: "alert.acked",
    match: "data.ackId",
    timeout: "10m",
  });

  if (ack) {
    await step.run("post-acked", () =>
      slack.send(alert.channel, `Acked by <@${ack.data.user}>`)
    );
    return { acked: true };
  }

  // 4. Escalate
  const manager = await step.run("get-manager", () =>
    pagerduty.managerOf(oncall.id)
  );
  await step.run("dm-manager", () =>
    slack.dm(manager.slack_id, `No ack on critical alert in 10m. Escalating.`)
  );
}

The step.waitForEvent is the load-bearing part. The workflow suspends for up to 10 minutes with zero compute cost. When the on-call clicks the Slack ack button, your interaction handler sends an event and the workflow resumes exactly where it left off.

Edge cases

What goes wrong, and how to handle it

Alert storm during an outage. Same severity, 200 alerts in 30 seconds. The dedup window collapses identical alerts. For non-identical alerts in the same incident, group by incident_id and send one batched message.

Slack rate limit. Slack throttles at 1 message per second per channel for non-priority. Add a per-channel concurrency limit:

concurrency: { key: "event.data.channel", limit: 1 }

Daily digest scheduling. Run a separate workflow on a daily cron that reads the alertBatch table, formats a summary, and posts it. One message per day, not 200.

On-call rotation changes. Cache PagerDuty lookups for 5 minutes max. Anything longer and you DM yesterday's on-call during today's incident.

The math

What this costs on Rotor

A typical mid-size team fires ~2,000 alerts a day. With dedup, ~600 actually post. Each posted alert = 4-6 step-runs. ~3,000 step-runs/day. ~90,000/month.

That fits Rotor Pro ($99/mo). On alternatives:

  • Building it on a worker yourself: free in compute, plus the time you spend tuning the dedup logic and writing the escalation flow from scratch. Two engineer-days minimum, then ongoing maintenance.
  • Zapier: 90k tasks/month would push you to the $200+/mo task tier with no native escalation primitive. The wait-for-ack flow is impossible without writing your own state.

Fork this playbook on Rotor.

$9 to start. 30-day money back. Hard caps protect you from runaway bills.

Start shipping