Search Knowledge Base Articles

Find answers, guides, and documentation

Webhook Retry Logic & Failure Handling

When a webhook delivery fails — because your endpoint is down, times out, or returns a non-200 response — Rackwave automatically retries the delivery using an exponential backoff schedule. This article explains exactly how retries work, what constitutes a failure, how to monitor and recover from failed deliveries, and how to prevent cascading failures.

What Counts as a Webhook Delivery Failure

Failure Type	Condition	Rackwave Action
Timeout	Your endpoint does not respond within 30 seconds	Mark as failed; schedule retry
Non-200 HTTP response	Your endpoint returns any status code other than 200–299	Mark as failed; schedule retry
Connection refused	Your server is not accepting connections on the registered port	Mark as failed; schedule retry
DNS resolution failure	Your endpoint domain cannot be resolved	Mark as failed; schedule retry
SSL/TLS error	Certificate expired, invalid, or self-signed	Mark as failed; schedule retry
Network unreachable	Rackwave servers cannot reach your endpoint IP	Mark as failed; schedule retry

200–299 all count as success: Rackwave treats any 2xx response (200, 201, 202, 204) as a successful delivery. You do not need to return exactly 200 — any success status stops the retry schedule for that event.

Retry Schedule — Exponential Backoff

Attempt #	Delay After Previous Failure	Approximate Time After Event	What Rackwave Logs
1	Immediate (0 seconds)	T + 0s	Status, response code, response body (first 1 KB)
2	5 minutes	T + 5 min	Same as above
3	30 minutes	T + 35 min	Same as above
4	2 hours	T + 2h 35 min	Same as above
5	8 hours	T + 10h 35 min	Same as above
Final	~13 hours after attempt 5	T + ~24 hours	Marked permanently failed — no further retries

After 24 hours: Events that fail all retry attempts are marked as permanently failed. They are stored in the delivery log for 30 days but will not be retried. If critical business events were missed, you must reconcile them manually using the API reporting endpoints.

Viewing Failed Webhook Deliveries

In MigoSMTP: go to Developer → Webhooks → [your webhook] → Delivery Log.
In Telnxo: go to Developer → Webhooks → [your webhook] → Delivery Log.
The log shows every delivery attempt with timestamp, HTTP status received, response body (truncated to 1 KB), and current status (pending retry, succeeded, or permanently failed).
Filter by Failed status to see only problem deliveries.
Click any row to expand and see the full request payload that was sent and the error response received.

Manually Retrying a Failed Delivery

If you have fixed the issue on your endpoint (brought it back online, fixed a bug) and want to force a retry without waiting for the scheduled backoff window:

Open the Webhook Delivery Log.
Find the failed delivery entry.
Click Retry Now on the right side of that row.
Rackwave immediately re-sends the original payload to your endpoint.

Auto-Disabling of Persistently Failing Webhooks

To protect the system from repeatedly calling consistently unreachable endpoints, Rackwave may automatically disable a webhook endpoint if it fails every delivery attempt over a 7-day period:

Condition	Action
100% failure rate over 7 consecutive days	Webhook endpoint auto-disabled; email alert sent to account owner
Webhook disabled by system	No new events delivered until webhook is manually re-enabled
Re-enabling a disabled webhook	Fix endpoint → go to Webhooks → click Enable → Rackwave begins delivering new events again

Best Practices for Reliable Webhook Handling

Return 200 immediately — do not perform heavy processing before responding. Queue the event and process it asynchronously.
Use a job queue — push received payloads to a queue (Redis, RabbitMQ, AWS SQS) and process them in background workers. This completely decouples your endpoint response time from processing time.
Set up health monitoring on your endpoint URL — use an uptime monitoring tool (UptimeRobot, Pingdom) to alert you immediately if your endpoint goes down.
Design idempotent handlers — since retries can deliver the same event multiple times, ensure your processing logic handles duplicate deliveries gracefully using event IDs as deduplication keys.
Log every received payload — store raw payloads in your own database for 30 days so you can replay or audit events if needed.
Monitor the Rackwave delivery log — set up a weekly review of your webhook delivery log to catch intermittent failures before they become persistent.

Reconciling Missed Events

If critical events were permanently missed (e.g. a prolonged outage caused all retries to exhaust), you can reconcile the data gap using the platform reporting API:

MigoSMTP: Use GET /v1/reports/messages?from=<start>&to=<end> to fetch message delivery status for any time window.
Telnxo: Use GET /v1/reports/messages with date range parameters for SMS, Voice, and WhatsApp logs.
Compare the API-sourced data with your internal records to identify and fill the gaps.

Next Steps

Did you find this article useful?