Playbook

The lead enrichment pipeline that scales.

Apollo plus Clay enrichment, batched, idempotent, with hard cost caps. Each enrichment is its own step. Failures don't re-bill APIs that already returned data.

10 min read

The pain

Why naive enrichment burns budget

A new lead enters HubSpot. You want their company size, tech stack, and a verified email. You call Apollo. Then Clay. Then Clay again for tech stack. Three API calls per lead. Each costs money.

A simple script runs all three calls inline. If the third call fails, the script retries the whole thing. Apollo and Clay charge you again for data they already returned.

The four ways naive enrichment burns money:

  • Whole-batch retries. One Clay timeout in a batch of 200 leads forces a full retry. You re-bill 199 successful enrichments.
  • No deduplication. A user submits the same form twice. Two enrichment runs. Two API bills for the same lead.
  • No rate-limit awareness. Clay rate-limits at 60 requests per minute per workspace. A 500-lead batch sent in parallel hits the limit, and most calls fail. You retry. Most fail again.
  • No cost cap. A bug doubles your enrichment volume for a day. Nobody notices until the Clay invoice arrives. The invoice is $4,000.

The architecture

What we're building

A workflow triggered per lead. Three enrichment calls, each as its own step. Idempotent on lead ID. Cost-capped at the workflow level.

Three core principles:

  • Each enrichment call is a step. Apollo, Clay company info, and Clay tech stack each run in their own step.run. If Clay tech stack times out, only that step retries. Apollo and Clay company info are memoized.
  • Idempotency keyed per lead. Trigger the workflow with idempotencyKey: lead.id. Duplicate triggers (form submitted twice) collapse into one run.
  • Concurrency capped at the API rate limit. Set concurrency limits per provider so a 500-lead batch doesn't stampede Clay. Excess leads queue and run as capacity frees.

The workflow shape:

workflow({
  id: "enrich-lead",
  trigger: { event: "hubspot.contact.created" },

  concurrency: [
    { scope: "account", key: "clay", limit: 30 },
    { scope: "account", key: "apollo", limit: 50 }
  ],

  steps: async ({ event, step }) => {
    const lead = event.data;

    // Three independent enrichment calls, each retriable in isolation
    const [apolloData, clayCompany, clayTechStack] = await Promise.all([
      step.run("apollo-email-verify", () =>
        apollo.email.verify({ email: lead.email })
      ),
      step.run("clay-company-info", () =>
        clay.enrich({ domain: lead.domain, type: "company" })
      ),
      step.run("clay-tech-stack", () =>
        clay.enrich({ domain: lead.domain, type: "tech_stack" })
      ),
    ]);

    const merged = await step.run("merge-enrichments", () =>
      mergeEnrichments(lead, { apolloData, clayCompany, clayTechStack })
    );

    await step.run("write-back-to-hubspot", () =>
      hubspot.contacts.update(lead.id, merged, {
        idempotencyKey: `enrich-${lead.id}-${lead.created_at}`,
      })
    );
  },
});

Trigger this with the lead ID as the workflow's idempotency key. Replays don't re-run successful steps. Same lead triggered twice collapses into one run.

Cost caps

Hard limits before the invoice arrives

Concurrency limits protect against rate-limit failures. Cost caps protect against runaway loops. Both belong in code.

Add a daily-spend tracker as a first step. If the team has burned their daily Apollo budget, skip Apollo and run with degraded enrichment.

steps: async ({ event, step, logger }) => {
  const lead = event.data;

  const todaysSpend = await step.run("get-spend-today", () =>
    db.enrichmentSpend.sumToday()
  );

  const APOLLO_DAILY_CAP = 500_00;  // $500 in cents
  const apolloEnabled = todaysSpend.apollo < APOLLO_DAILY_CAP;

  if (!apolloEnabled) {
    logger.warn("Apollo cap hit for today, skipping email verify", {
      todaysSpend: todaysSpend.apollo,
    });
    await step.run("alert-cap-hit", () =>
      slack.send("#enrichment-alerts", `Apollo daily cap hit ($${APOLLO_DAILY_CAP / 100}). Enrichment running degraded.`)
    );
  }

  const apolloData = apolloEnabled
    ? await step.run("apollo-email-verify", () => apollo.email.verify({ email: lead.email }))
    : null;

  // ... rest of enrichment ...
}

The Slack alert is the load-bearing part. A cap that silently skips enrichment is worse than no cap. Tell someone.

Edge cases

What goes wrong, and how to handle it

Partial enrichment success. Apollo returns OK. Clay times out twice. Write what you have. Don't fail the whole workflow. The merge step takes nullable inputs and fills HubSpot with whatever it received.

Lead deleted mid-enrichment. Skip the write-back step if the lead doesn't exist anymore in HubSpot. Add a check at the top of the merge step.

Same domain, different leads. Cache Clay company enrichments per domain. The first lead from acme.com pays the Clay bill. The next 50 leads from acme.com use the cached result.

const clayCompany = await step.run(`clay-company-${lead.domain}`, async () => {
  const cached = await db.clayCompanyCache.find({ domain: lead.domain });
  if (cached && !cached.expired) return cached.data;

  const fresh = await clay.enrich({ domain: lead.domain, type: "company" });
  await db.clayCompanyCache.upsert({ domain: lead.domain, data: fresh });
  return fresh;
});

The step ID includes the domain. Same domain across multiple leads memoizes to the same step output. Even without the cache table, Rotor's step memoization saves the second call.

Schema mismatch. Clay added a new field. Your merge function ignores it. Run a weekly cron that compares the latest Clay response shape to your merge function's known fields and alerts on drift.

The math

What this costs on Rotor

200 leads enriched per day. Each lead = 5 step-runs (3 enrichments + merge + write-back). 1,000 step-runs/day. 30,000/month.

That fits Rotor Starter ($29/mo, 20k included + $1.50/1k overage = ~$44/mo all-in).

Same workflow on alternatives:

  • Zapier: 30k tasks/month = Professional plan + 30k task tier. Cost varies, but at 2.5 cents per task you're looking at $750/mo just for orchestration on top of the API costs.
  • Make: ~30k credits/mo = Pro tier $29-79/mo depending on credit volume. Cheaper than Zapier, but you still write the same workflow as visual scenarios that take longer to debug.
  • Building it on cron: free until Clay times out at 3am and your night-of-launch lead enrichment runs without phone numbers.

The compounding value is in the audit trail. Every enrichment call is logged with its inputs, outputs, duration, and cost. When your CFO asks why the Clay bill jumped, you can answer in 30 seconds.

Fork this playbook on Rotor.

$9 to start. 30-day money back. Hard caps protect you from runaway bills.

Start shipping