S3 Audit Export — Rotor Docs

Warning

Audit-event data is available now via GET /v1/audit?format=csv. Automated push to S3 is an Enterprise-tier add-on shipping in Phase 4 — until then, customers can download CSV exports from GET /v1/audit?format=csv and upload them manually to their S3 bucket using the schema and prefix layout documented below.

Output Schema

Parquet files produced by Rotor follow this column schema, derived from the audit_event Postgres table (shipped in plan 02-01):

Column	Parquet type	Nullable	Notes
`id`	STRING	No	UUID (hyphen-separated)
`workspace_id`	STRING	No	Workspace key (`ws_...`)
`event_type`	STRING	No	e.g. `job.completed`, `guardrail.blocked`
`actor_id`	STRING	No	API key ID or `system`
`resource_id`	STRING	Yes	Affected resource ID (job ID, webhook ID)
`metadata`	STRING (JSON-encoded)	Yes	Event-specific payload as JSON string
`occurred_at`	INT96 (TIMESTAMP_MICROS)	No	UTC; partition key
`ip_address`	STRING	Yes	IPv4 or IPv6

metadata is written as a JSON string (not a nested struct) to maximise compatibility across Databricks, BigQuery, and Redshift, all of which handle JSON strings with native functions (JSON_VALUE, from_json, JSON_EXTRACT_PATH_TEXT).

Bucket Layout

Files are written in Hive-style partitioning for compatibility with Athena, BigQuery wildcard queries, and Databricks Auto Loader:

s3://customer-bucket/rotor-audit/
  workspace=ws_abc123/
    year=2026/
      month=04/
        day=13/
          part-00000.parquet
          part-00001.parquet
          _checksum.sha256       ← row count + sha256 of all parts

Each part-NNNNN.parquet file is at most 128 MB uncompressed (snappy-compressed in practice). A _checksum.sha256 file is written atomically after all parts for a day complete, enabling idempotent detection of complete vs. in-progress drops.

Schedule

By default, Rotor exports the previous day's events at 02:00 UTC each day. The schedule is configurable per workspace (Enterprise tier). Contact [email protected] to change frequency or timezone.

IAM Setup

Rotor assumes a customer-provided IAM role to write to your S3 bucket. The role needs only s3:PutObject on the audit prefix — Rotor never reads from your bucket:

Step 1: Create the IAM role

Create a new IAM role with a trust policy that allows Rotor's AWS account to assume it:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<PROVIDED_BY_ROTOR>"
        }
      }
    }
  ]
}

Replace 123456789012 with Rotor's AWS account ID (provided during Enterprise onboarding). The ExternalId is unique per workspace and is provided by Rotor to prevent confused-deputy attacks.

Step 2: Attach a minimal permissions policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Resource": "arn:aws:s3:::customer-bucket/rotor-audit/*"
    }
  ]
}

Step 3: Provide Rotor with the role ARN

Contact [email protected] with your:

AWS account ID
IAM role ARN
S3 bucket name and region
Preferred export schedule (default: daily at 02:00 UTC)

Athena Example Query

Once files are in S3, register a Glue table pointing to the Hive partition layout:

CREATE EXTERNAL TABLE rotor_audit (
  id           STRING,
  workspace_id STRING,
  event_type   STRING,
  actor_id     STRING,
  resource_id  STRING,
  metadata     STRING,
  occurred_at  TIMESTAMP,
  ip_address   STRING
)
PARTITIONED BY (year STRING, month STRING, day STRING)
STORED AS PARQUET
LOCATION 's3://customer-bucket/rotor-audit/workspace=ws_abc123/'
TBLPROPERTIES ('parquet.compression'='SNAPPY');
 
-- Load partitions
MSCK REPAIR TABLE rotor_audit;

Run queries filtered to a specific event type:

-- Find all guardrail blocks for a workspace in April
SELECT occurred_at, actor_id, JSON_EXTRACT_SCALAR(metadata, '$.reason') AS reason
FROM rotor_audit
WHERE workspace = 'ws_abc123'
  AND event_type = 'guardrail.blocked'
  AND year = '2026' AND month = '04'
ORDER BY occurred_at DESC
LIMIT 100;

Cost estimate: Athena charges $5/TB scanned. A 10M-row workspace generates ~20 MB parquet/day; scanning one month of data costs approximately $0.003.

BigQuery Transfer

Use BigQuery Data Transfer Service to ingest S3 parquet files into BigQuery automatically:

In BigQuery, go to Data Transfers → Create Transfer.
Select Amazon S3 as the source.
Set the source URI pattern: s3://customer-bucket/rotor-audit/workspace=<ws_id>/year=*/month=*/day=*/*.parquet
Configure the service account with cross-account S3 read access.
Set schedule to Daily starting at 04:00 UTC (after Rotor's 02:00 UTC export completes).

BigQuery automatically infers the Parquet schema. The metadata column is ingested as STRING and can be queried with JSON_VALUE(metadata, '$.reason').

Databricks Auto Loader

Databricks Auto Loader natively supports S3 + Hive partitioning:

df = (
  spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "parquet")
    .option("cloudFiles.schemaLocation", "/mnt/checkpoints/rotor-audit-schema")
    .load("s3://customer-bucket/rotor-audit/workspace=ws_abc123/")
)
 
df.writeStream \
  .format("delta") \
  .option("checkpointLocation", "/mnt/checkpoints/rotor-audit") \
  .table("rotor_audit_events")

Auto Loader processes only new files on each trigger, making it efficient for daily batch drops as well as near-real-time streaming if Rotor is configured for hourly exports.

Redshift Spectrum

For Redshift customers, use Redshift Spectrum to query directly from S3 without ingestion:

-- Create external schema pointing to your S3 audit data
CREATE EXTERNAL SCHEMA rotor_audit
FROM DATA CATALOG
DATABASE 'rotor_audit_db'
IAM_ROLE 'arn:aws:iam::<your-account>:role/RedshiftSpectrumRole';
 
-- Query directly
SELECT event_type, COUNT(*) as cnt
FROM rotor_audit.audit_events
WHERE workspace_id = 'ws_abc123'
  AND occurred_at > GETDATE() - INTERVAL '30 days'
GROUP BY 1
ORDER BY cnt DESC;

Verification

After each daily export, verify completeness using the _checksum.sha256 file:

# Download checksum file
aws s3 cp \
  s3://customer-bucket/rotor-audit/workspace=ws_abc123/year=2026/month=04/day=13/_checksum.sha256 \
  ./checksum.sha256
 
# File format: "<row_count> <sha256_hex_of_all_parts>"
cat checksum.sha256
# Example: 142857 a3f9...
 
# Compare with Postgres source count
psql $DATABASE_URL -c "
  SELECT COUNT(*)
  FROM audit_event
  WHERE workspace_id = 'ws_abc123'
    AND occurred_at::date = '2026-04-13'
"

Row counts should match. Contact [email protected] if you observe consistent discrepancies greater than 0.01%.