When a webhook delivery fails — because your endpoint is down, times out, or returns a non-200 response — Rackwave automatically retries the delivery using an exponential backoff schedule. This article explains exactly how retries work, what constitutes a failure, how to monitor and recover from failed deliveries, and how to prevent cascading failures.
What Counts as a Webhook Delivery Failure
| Failure Type | Condition | Rackwave Action |
|---|---|---|
| Timeout | Your endpoint does not respond within 30 seconds | Mark as failed; schedule retry |
| Non-200 HTTP response | Your endpoint returns any status code other than 200–299 | Mark as failed; schedule retry |
| Connection refused | Your server is not accepting connections on the registered port | Mark as failed; schedule retry |
| DNS resolution failure | Your endpoint domain cannot be resolved | Mark as failed; schedule retry |
| SSL/TLS error | Certificate expired, invalid, or self-signed | Mark as failed; schedule retry |
| Network unreachable | Rackwave servers cannot reach your endpoint IP | Mark as failed; schedule retry |
Retry Schedule — Exponential Backoff
| Attempt # | Delay After Previous Failure | Approximate Time After Event | What Rackwave Logs |
|---|---|---|---|
| 1 | Immediate (0 seconds) | T + 0s | Status, response code, response body (first 1 KB) |
| 2 | 5 minutes | T + 5 min | Same as above |
| 3 | 30 minutes | T + 35 min | Same as above |
| 4 | 2 hours | T + 2h 35 min | Same as above |
| 5 | 8 hours | T + 10h 35 min | Same as above |
| Final | ~13 hours after attempt 5 | T + ~24 hours | Marked permanently failed — no further retries |
Viewing Failed Webhook Deliveries
- In MigoSMTP: go to Developer → Webhooks → [your webhook] → Delivery Log.
- In Telnxo: go to Developer → Webhooks → [your webhook] → Delivery Log.
- The log shows every delivery attempt with timestamp, HTTP status received, response body (truncated to 1 KB), and current status (pending retry, succeeded, or permanently failed).
- Filter by Failed status to see only problem deliveries.
- Click any row to expand and see the full request payload that was sent and the error response received.
Manually Retrying a Failed Delivery
If you have fixed the issue on your endpoint (brought it back online, fixed a bug) and want to force a retry without waiting for the scheduled backoff window:
- Open the Webhook Delivery Log.
- Find the failed delivery entry.
- Click Retry Now on the right side of that row.
- Rackwave immediately re-sends the original payload to your endpoint.
Auto-Disabling of Persistently Failing Webhooks
To protect the system from repeatedly calling consistently unreachable endpoints, Rackwave may automatically disable a webhook endpoint if it fails every delivery attempt over a 7-day period:
| Condition | Action |
|---|---|
| 100% failure rate over 7 consecutive days | Webhook endpoint auto-disabled; email alert sent to account owner |
| Webhook disabled by system | No new events delivered until webhook is manually re-enabled |
| Re-enabling a disabled webhook | Fix endpoint → go to Webhooks → click Enable → Rackwave begins delivering new events again |
Best Practices for Reliable Webhook Handling
- Return 200 immediately — do not perform heavy processing before responding. Queue the event and process it asynchronously.
- Use a job queue — push received payloads to a queue (Redis, RabbitMQ, AWS SQS) and process them in background workers. This completely decouples your endpoint response time from processing time.
- Set up health monitoring on your endpoint URL — use an uptime monitoring tool (UptimeRobot, Pingdom) to alert you immediately if your endpoint goes down.
- Design idempotent handlers — since retries can deliver the same event multiple times, ensure your processing logic handles duplicate deliveries gracefully using event IDs as deduplication keys.
- Log every received payload — store raw payloads in your own database for 30 days so you can replay or audit events if needed.
- Monitor the Rackwave delivery log — set up a weekly review of your webhook delivery log to catch intermittent failures before they become persistent.
Reconciling Missed Events
If critical events were permanently missed (e.g. a prolonged outage caused all retries to exhaust), you can reconcile the data gap using the platform reporting API:
- MigoSMTP: Use
GET /v1/reports/messages?from=<start>&to=<end>to fetch message delivery status for any time window. - Telnxo: Use
GET /v1/reports/messageswith date range parameters for SMS, Voice, and WhatsApp logs. - Compare the API-sourced data with your internal records to identify and fill the gaps.