Incident Response Policy

Status: ACTIVE Owner: Daan ([email protected]) Effective: 2026-04-20 Last reviewed: 2026-04-20 Next review: 2026-10-20

Purpose

Define the process for detecting, responding to, and communicating security incidents and service disruptions affecting rotor.sh customers.

Scope

All security incidents (data breaches, credential exposure, unauthorized access) and significant service disruptions (>1h downtime, data loss, billing errors).

Definitions

  • P0 — Critical: Data breach, auth bypass, billing fraud, or complete service outage. Response starts within 1 hour.
  • P1 — High: Partial service outage, significant performance degradation (>2x latency), or credential exposure. Response starts within 4 hours.
  • P2 — Medium: Non-critical bug affecting a subset of customers. Response within 24 hours.
  • P3 — Low: Cosmetic issues, documentation errors. Response within 72 hours.

Incident Response Process

1. Detection

Incidents may be detected via:

  • Sentry error alerts (apps/api, apps/worker)
  • Railway service health monitors
  • Customer reports to [email protected] or [email protected]
  • Vanta compliance monitoring alerts
  • Internal monitoring (BullMQ job failure spikes, Postgres connection errors)

2. Triage (within 1h for P0, 4h for P1)

  1. Confirm the incident is real (not a false positive).
  2. Classify severity (P0–P3).
  3. Identify affected scope: which workspaces, customers, data types.
  4. Open an incident Slack channel: #incident-YYYY-MM-DD-brief-description.
  5. Assign an incident commander (IC).

3. Containment

  • For credential exposure: rotate affected secrets immediately (WEBHOOK_SECRET_ENCRYPTION_KEY, Supabase service key, Railway tokens).
  • For unauthorized access: revoke affected API keys / sessions via Supabase admin.
  • For service disruption: engage Railway/Fly support and activate the kill-switch (BIL-06) for affected workspaces if data integrity is at risk.

4. Eradication and Recovery

  • Identify root cause; implement the minimum fix.
  • Apply the fix via the emergency hotfix procedure (see Change Management Policy).
  • Validate fix in staging before applying to production.
  • Restore service; verify affected customers can access their data.

5. Customer Communication (P0 / P1 — 24h disclosure commitment)

We commit to disclosing confirmed breaches affecting customer data within 24 hours of confirmation.

Communication channels:

  • Email to affected workspace admins (sourced from Supabase team_member table).
  • Status page update at status.rotor.sh.
  • For Enterprise: direct phone/Slack contact if account has a CSM.

Disclosure must include:

  • What happened and when (UTC timestamps).
  • What data was affected.
  • What we have done to contain it.
  • What customers should do (e.g. rotate API keys).

6. Post-Incident Review

Within 5 business days of P0/P1 resolution:

  • Write a blameless post-mortem (5 Whys format).
  • Document timeline, contributing factors, and action items.
  • Share with affected customers on request.
  • Add action items to engineering backlog with due dates.

Escalation Contacts

RoleContact
Incident Commander[email protected]
Supabase supportsupport.supabase.com
Railway supportrailway.app/help
Fly.io supportfly.io/docs/support
Sentry on-callvia Sentry alerting rules

Review Cadence

Reviewed annually. Next review: 2026-10-20.