Delays in Webhook delivery in EU region

Incident Report for Onfido

Postmortem

Summary

Webhook event deliveries in the EU region faced delays of up to 22 mins between 18:44 to 20:44 UTC, following an extreme level of webhook notification request timeouts. This necessitated scaling up the relevant service to mitigate.

Root Causes

There was an unusual amount of webhook notification request timeouts which saturated the resources available to process webhooks. Our scale up strategy was not setup correctly to handle that particular situation.

Timeline

17:34 UTC: There began to be a large increase in request timeout to specific customer webhook endpoints

17:46 UTC: We get notified with modest abnormal webhook latency for some events

18:04 UTC: We scaled up the service’s resources, latency was stabilised

18:44 UTC: Webhook latency began to degrade significantly on all events

20:20 - 20:40 UTC: Further adjustments made to the service's autoscaling configuration

20:44 UTC: Incident fully recovered, including clearing of any queued requests

Remedies

We updated our auto-scaling configuration. Follow up work planned to improve handling of abnormal request timeouts.

Posted Sep 22, 2025 - 13:07 UTC

Resolved

This incident has been resolved.

Posted Sep 17, 2025 - 21:16 UTC

Monitoring

The issue has been resolved, webhook delivery is back to normal. We continue to monitor.

Posted Sep 17, 2025 - 20:52 UTC

Investigating

We are currently investigating this issue.

Posted Sep 17, 2025 - 20:34 UTC

This incident affected: Europe (onfido.com) (Webhooks).