Anomalous number of 5xx errors on the Studio API
Incident Report for Onfido
Postmortem

Summary

On 2023-10-26, a failure in the rate limiting middleware for the Studio API denied access to rate-limited endpoints for a period of 8 minutes, during 04:12am and 04:20am.

Root Causes

A failover caused by a security update in the underlying storage system was not handled appropriately, and did not fallback to bypass rate limiting altogether.

Timeline

  • 2023-10-26 04:12:20 GMT: Rate-limited Studio API endpoints started responding with 500 HTTP status code.
  • 2023-10-26 04:20:40 GMT: Rate-limited Studio API endpoints started responding successfully again.

Remedies

We will implement an explicit bypass strategy on this rate limiting middleware so we can tolerate planned maintenance or other circumstances that prevent access to this state storage subsystem.

Posted Nov 22, 2023 - 12:59 UTC

Resolved
On 2023-10-26, a failure in the rate limiting middleware for the Studio API denied access to rate-limited endpoints for a period of 8 minutes, during 04:12am and 04:20am.
Posted Oct 26, 2023 - 04:12 UTC
This incident affected: Europe (onfido.com) (API).