SkillsMay 7, 2026·3 min read

Replicate Webhooks — Async Notifications for Slow Models

Replicate Webhooks let async predictions notify your server when ready. Skip polling for slow models (FLUX, video gen). HMAC-signed for verifiable origin.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 39c66b39-d832-457b-89f8-308f599fc64b
Intro

Replicate Webhooks let you start a prediction and have Replicate POST to your server when it's done — no polling needed. Critical for slow models (FLUX image generation, video generation, large LLMs) where the prediction can run for tens of seconds to minutes. Best for: production apps with async UI, agents that fire-and-forget, and queue-based pipelines. Works with: Replicate API, any HTTP endpoint. Setup time: 5 minutes.


Start a prediction with webhook

import Replicate from "replicate";

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

const prediction = await replicate.predictions.create({
  version: "black-forest-labs/flux-schnell",
  input: { prompt: "A misty forest at dawn, photorealistic" },
  webhook: "https://yourapp.com/webhooks/replicate",
  webhook_events_filter: ["completed"],  // or ["start", "output", "logs", "completed"]
});

return Response.json({ predictionId: prediction.id });

Receive the webhook (Next.js example)

// app/webhooks/replicate/route.ts
import crypto from "node:crypto";

export async function POST(req: Request) {
  // Verify the webhook signature
  const signature = req.headers.get("webhook-signature");
  const timestamp = req.headers.get("webhook-timestamp");
  const id = req.headers.get("webhook-id");
  const body = await req.text();

  const signedContent = `${id}.${timestamp}.${body}`;
  const expected = crypto
    .createHmac("sha256", process.env.REPLICATE_WEBHOOK_SECRET!)
    .update(signedContent)
    .digest("base64");

  if (!signature?.includes(expected)) {
    return Response.json({ error: "invalid signature" }, { status: 401 });
  }

  // Process the prediction
  const prediction = JSON.parse(body);
  if (prediction.status === "succeeded") {
    await saveImageUrl(prediction.id, prediction.output);
  } else {
    await flagFailure(prediction.id, prediction.error);
  }

  return Response.json({ ok: true });
}

Get the webhook secret

# Generate a webhook signing secret
curl -X POST https://api.replicate.com/v1/webhooks/default/secret \
  -H "Authorization: Token $REPLICATE_API_TOKEN"

# Returns: { "key": "whsec_..." }
# Store as REPLICATE_WEBHOOK_SECRET env var

Why this beats polling

  • Polling: 5-second intervals = 5-second p50 latency for done events, plus ongoing API requests against your quota
  • Webhooks: Sub-second from completion to your handler, zero polling load

FAQ

Q: Are webhooks idempotent? A: Replicate may retry a webhook on transient errors, so handlers must be idempotent. Use the webhook-id header (a per-event unique ID) for deduplication on your side.

Q: Can I get streaming output via webhooks? A: Yes — set webhook_events_filter: ["output"] to receive incremental output events as the model produces tokens / frames / partial results. Useful for streaming UI updates from slow models.

Q: What if my webhook endpoint is down? A: Replicate retries failed webhooks with exponential backoff for ~24 hours. After that, the prediction is still available via GET /predictions/{id}, but webhooks won't replay.


Quick Use

  1. Generate a webhook secret: POST /v1/webhooks/default/secret
  2. Pass webhook and webhook_events_filter in your predictions.create call
  3. Implement HMAC-SHA256 signature verification in your handler

Intro

Replicate Webhooks let you start a prediction and have Replicate POST to your server when it's done — no polling needed. Critical for slow models (FLUX image generation, video generation, large LLMs) where the prediction can run for tens of seconds to minutes. Best for: production apps with async UI, agents that fire-and-forget, and queue-based pipelines. Works with: Replicate API, any HTTP endpoint. Setup time: 5 minutes.


Start a prediction with webhook

import Replicate from "replicate";

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

const prediction = await replicate.predictions.create({
  version: "black-forest-labs/flux-schnell",
  input: { prompt: "A misty forest at dawn, photorealistic" },
  webhook: "https://yourapp.com/webhooks/replicate",
  webhook_events_filter: ["completed"],  // or ["start", "output", "logs", "completed"]
});

return Response.json({ predictionId: prediction.id });

Receive the webhook (Next.js example)

// app/webhooks/replicate/route.ts
import crypto from "node:crypto";

export async function POST(req: Request) {
  // Verify the webhook signature
  const signature = req.headers.get("webhook-signature");
  const timestamp = req.headers.get("webhook-timestamp");
  const id = req.headers.get("webhook-id");
  const body = await req.text();

  const signedContent = `${id}.${timestamp}.${body}`;
  const expected = crypto
    .createHmac("sha256", process.env.REPLICATE_WEBHOOK_SECRET!)
    .update(signedContent)
    .digest("base64");

  if (!signature?.includes(expected)) {
    return Response.json({ error: "invalid signature" }, { status: 401 });
  }

  // Process the prediction
  const prediction = JSON.parse(body);
  if (prediction.status === "succeeded") {
    await saveImageUrl(prediction.id, prediction.output);
  } else {
    await flagFailure(prediction.id, prediction.error);
  }

  return Response.json({ ok: true });
}

Get the webhook secret

# Generate a webhook signing secret
curl -X POST https://api.replicate.com/v1/webhooks/default/secret \
  -H "Authorization: Token $REPLICATE_API_TOKEN"

# Returns: { "key": "whsec_..." }
# Store as REPLICATE_WEBHOOK_SECRET env var

Why this beats polling

  • Polling: 5-second intervals = 5-second p50 latency for done events, plus ongoing API requests against your quota
  • Webhooks: Sub-second from completion to your handler, zero polling load

FAQ

Q: Are webhooks idempotent? A: Replicate may retry a webhook on transient errors, so handlers must be idempotent. Use the webhook-id header (a per-event unique ID) for deduplication on your side.

Q: Can I get streaming output via webhooks? A: Yes — set webhook_events_filter: ["output"] to receive incremental output events as the model produces tokens / frames / partial results. Useful for streaming UI updates from slow models.

Q: What if my webhook endpoint is down? A: Replicate retries failed webhooks with exponential backoff for ~24 hours. After that, the prediction is still available via GET /predictions/{id}, but webhooks won't replay.


Source & Thanks

Built by Replicate. Commercial product.

replicate.com/docs/webhooks — Webhooks documentation

🙏

Source & Thanks

Built by Replicate. Commercial product.

replicate.com/docs/webhooks — Webhooks documentation

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets