Audit-event data is available now via GET /v1/audit?format=csv. Automated push to
S3 is an Enterprise-tier add-on shipping in Phase 4 — until then, customers can download
CSV exports from GET /v1/audit?format=csv and upload them manually to their S3 bucket
using the schema and prefix layout documented below.
Output Schema
Parquet files produced by Rotor follow this column schema, derived from the audit_event
Postgres table (shipped in plan 02-01):
| Column | Parquet type | Nullable | Notes |
|---|---|---|---|
id | STRING | No | UUID (hyphen-separated) |
workspace_id | STRING | No | Workspace key (ws_...) |
event_type | STRING | No | e.g. job.completed, guardrail.blocked |
actor_id | STRING | No | API key ID or system |
resource_id | STRING | Yes | Affected resource ID (job ID, webhook ID) |
metadata | STRING (JSON-encoded) | Yes | Event-specific payload as JSON string |
occurred_at | INT96 (TIMESTAMP_MICROS) | No | UTC; partition key |
ip_address | STRING | Yes | IPv4 or IPv6 |
metadata is written as a JSON string (not a nested struct) to maximise compatibility across
Databricks, BigQuery, and Redshift, all of which handle JSON strings with native functions
(JSON_VALUE, from_json, JSON_EXTRACT_PATH_TEXT).
Bucket Layout
Files are written in Hive-style partitioning for compatibility with Athena, BigQuery wildcard queries, and Databricks Auto Loader:
s3://customer-bucket/rotor-audit/
workspace=ws_abc123/
year=2026/
month=04/
day=13/
part-00000.parquet
part-00001.parquet
_checksum.sha256 ← row count + sha256 of all parts
Each part-NNNNN.parquet file is at most 128 MB uncompressed (snappy-compressed in practice).
A _checksum.sha256 file is written atomically after all parts for a day complete, enabling
idempotent detection of complete vs. in-progress drops.
Schedule
By default, Rotor exports the previous day's events at 02:00 UTC each day. The schedule is configurable per workspace (Enterprise tier). Contact [email protected] to change frequency or timezone.
IAM Setup
Rotor assumes a customer-provided IAM role to write to your S3 bucket. The role needs only
s3:PutObject on the audit prefix — Rotor never reads from your bucket:
Step 1: Create the IAM role
Create a new IAM role with a trust policy that allows Rotor's AWS account to assume it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<PROVIDED_BY_ROTOR>"
}
}
}
]
}Replace 123456789012 with Rotor's AWS account ID (provided during Enterprise onboarding).
The ExternalId is unique per workspace and is provided by Rotor to prevent confused-deputy
attacks.
Step 2: Attach a minimal permissions policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::customer-bucket/rotor-audit/*"
}
]
}Step 3: Provide Rotor with the role ARN
Contact [email protected] with your:
- AWS account ID
- IAM role ARN
- S3 bucket name and region
- Preferred export schedule (default: daily at 02:00 UTC)
Athena Example Query
Once files are in S3, register a Glue table pointing to the Hive partition layout:
CREATE EXTERNAL TABLE rotor_audit (
id STRING,
workspace_id STRING,
event_type STRING,
actor_id STRING,
resource_id STRING,
metadata STRING,
occurred_at TIMESTAMP,
ip_address STRING
)
PARTITIONED BY (year STRING, month STRING, day STRING)
STORED AS PARQUET
LOCATION 's3://customer-bucket/rotor-audit/workspace=ws_abc123/'
TBLPROPERTIES ('parquet.compression'='SNAPPY');
-- Load partitions
MSCK REPAIR TABLE rotor_audit;Run queries filtered to a specific event type:
-- Find all guardrail blocks for a workspace in April
SELECT occurred_at, actor_id, JSON_EXTRACT_SCALAR(metadata, '$.reason') AS reason
FROM rotor_audit
WHERE workspace = 'ws_abc123'
AND event_type = 'guardrail.blocked'
AND year = '2026' AND month = '04'
ORDER BY occurred_at DESC
LIMIT 100;Cost estimate: Athena charges $5/TB scanned. A 10M-row workspace generates ~20 MB parquet/day; scanning one month of data costs approximately $0.003.
BigQuery Transfer
Use BigQuery Data Transfer Service to ingest S3 parquet files into BigQuery automatically:
- In BigQuery, go to Data Transfers → Create Transfer.
- Select Amazon S3 as the source.
- Set the source URI pattern:
s3://customer-bucket/rotor-audit/workspace=<ws_id>/year=*/month=*/day=*/*.parquet - Configure the service account with cross-account S3 read access.
- Set schedule to Daily starting at 04:00 UTC (after Rotor's 02:00 UTC export completes).
BigQuery automatically infers the Parquet schema. The metadata column is ingested as
STRING and can be queried with JSON_VALUE(metadata, '$.reason').
Databricks Auto Loader
Databricks Auto Loader natively supports S3 + Hive partitioning:
df = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("cloudFiles.schemaLocation", "/mnt/checkpoints/rotor-audit-schema")
.load("s3://customer-bucket/rotor-audit/workspace=ws_abc123/")
)
df.writeStream \
.format("delta") \
.option("checkpointLocation", "/mnt/checkpoints/rotor-audit") \
.table("rotor_audit_events")Auto Loader processes only new files on each trigger, making it efficient for daily batch drops as well as near-real-time streaming if Rotor is configured for hourly exports.
Redshift Spectrum
For Redshift customers, use Redshift Spectrum to query directly from S3 without ingestion:
-- Create external schema pointing to your S3 audit data
CREATE EXTERNAL SCHEMA rotor_audit
FROM DATA CATALOG
DATABASE 'rotor_audit_db'
IAM_ROLE 'arn:aws:iam::<your-account>:role/RedshiftSpectrumRole';
-- Query directly
SELECT event_type, COUNT(*) as cnt
FROM rotor_audit.audit_events
WHERE workspace_id = 'ws_abc123'
AND occurred_at > GETDATE() - INTERVAL '30 days'
GROUP BY 1
ORDER BY cnt DESC;Verification
After each daily export, verify completeness using the _checksum.sha256 file:
# Download checksum file
aws s3 cp \
s3://customer-bucket/rotor-audit/workspace=ws_abc123/year=2026/month=04/day=13/_checksum.sha256 \
./checksum.sha256
# File format: "<row_count> <sha256_hex_of_all_parts>"
cat checksum.sha256
# Example: 142857 a3f9...
# Compare with Postgres source count
psql $DATABASE_URL -c "
SELECT COUNT(*)
FROM audit_event
WHERE workspace_id = 'ws_abc123'
AND occurred_at::date = '2026-04-13'
"Row counts should match. Contact [email protected] if you observe consistent discrepancies greater than 0.01%.