Documentation
Webhooks Best Practices
Design resilient webhook receivers that handle retries, timeouts, and duplicate deliveries without breaking your application.
Best Practices
Core Principles for Reliable Webhook Receivers
Every webhook integration you build will face network hiccups, server restarts, and payload duplicates. Following these principles ensures your system stays resilient under real-world conditions.
Always Acknowledge Within 3 Seconds
Respond with a 2xx status code as fast as possible. Defer heavy processing to a background queue. Providers like Stripe and GitHub will treat slow responses as failures and keep retrying indefinitely.
Verify Signatures on Every Request
Never trust incoming webhook payloads blindly. Validate the signature header (e.g., X-WebhookWatch-Signature) using your shared secret before processing any data. This prevents spoofed events from corrupting your system.
Implement Idempotency Keys
Store the event ID or idempotency key from each payload in a database. Before processing, check if you've already handled that event. This prevents duplicate charges, double emails, or corrupted state when retries arrive.
Log Everything, Alert on Failures
Record every incoming webhook payload, response code, and processing outcome. Set up alerts for repeated 5xx errors. WebhookWatch captures all failed deliveries automatically so you can replay them from the dashboard.
Test with Simulated Failures
Use WebhookWatch to simulate network partitions, delayed responses, and duplicate deliveries. Verify your receiver handles each scenario gracefully before going to production. Real failures always expose gaps you missed in staging.
Validate Payload Structure Early
Check required fields and data types at the top of your handler. Return a 400 Bad Request for malformed payloads so the provider stops retrying invalid data. Log the raw payload for debugging before throwing.
Common Pitfalls
Mistakes That Break Webhook Integrations
We've analyzed over 12,000 webhook endpoints through WebhookWatch. These are the most frequent failure patterns we see in production systems.
Processing Before Responding
Running database queries, sending emails, or calling external APIs before sending a 200 OK response. This causes cascading retries when your handler exceeds the 3-second timeout. Result: triple the load, triple the cost, corrupted state.
Silently Swallowing Errors
Catching exceptions and returning 200 OK even when processing failed. The provider thinks delivery succeeded, but your system missed the event. You'll discover missing orders or unprocessed payments days later with no audit trail.
Skipping Signature Verification
Running webhook endpoints without verifying the HMAC signature. Any malicious actor can send fake events to your public URL. This has led to account takeovers and fraudulent transactions in at least 3 public postmortems we've reviewed.
Ignoring Event Ordering
Assuming events arrive in chronological order. When retries mix with new events, you might process a "payment.cancelled" event before "payment.completed". Always use the event timestamp and idempotency keys to enforce correct ordering.
Retry Logic
Designing Your Retry Strategy
Webhook providers use exponential backoff with jitter to retry failed deliveries. Your receiver should anticipate these retries and handle them correctly. Here's a reference implementation for a robust webhook handler.
const crypto = require('crypto');
const { processEvent } = require('./queue');
const WEBHOOK_SECRET = process.env.WEBHOOK_SECRET;
const processedEvents = new Set();
app.post('/webhooks/receive', async (req, res) => {
// 1. Verify signature immediately
const signature = req.headers['x-webhookwatch-signature'];
const payload = JSON.stringify(req.body);
const expected = crypto
.createHmac('sha256', WEBHOOK_SECRET)
.update(payload)
.digest('hex');
if (!signature || signature !== expected) {
return res.status(401).send('Invalid signature');
}
// 2. Check idempotency before any processing
const eventId = req.body.id;
const eventType = req.body.type;
if (processedEvents.has(eventId)) {
console.log(`Event ${eventId} already processed, returning 200`);
return res.status(200).send('Already processed');
}
// 3. Acknowledge immediately, defer work
processedEvents.add(eventId);
res.status(200).send('Received');
// 4. Process asynchronously โ failures won't trigger retries
try {
await processEvent(eventId, eventType, req.body);
} catch (err) {
console.error(`Failed to process ${eventId}:`, err.message);
// Do NOT remove from processedEvents โ handle via alerting
await sendAlert({ eventId, error: err.message });
}
});
Exponential Backoff Schedule
Most providers retry on a schedule like 1m, 5m, 15m, 1h, 4h, 12h, 24h. After 7 attempts over roughly 1.5 days, they stop. Design your system to handle events arriving hours or days after the original trigger.
Jitter Prevents Thundering Herds
Providers add random jitter to retry intervals so thousands of failed webhooks don't hit your server simultaneously. Your idempotency layer must handle out-of-order arrivals from this jitter.
Fail Fast on Permanent Errors
If a payload is malformed or your schema has changed, return a 400 or 422 status code. This tells the provider to stop retrying. Only 5xx responses and timeouts trigger retries. Don't waste provider resources on unfixable errors.