On the 10th February 2025 around 17:34 UTC, a dependency upgrade, scoped to the component responsible for sourcing webhooks system, was deployed to production which caused a partial outage in the delivery of webhooks for Studio, affecting ~38% of traffic in the CA and US regions, for the duration of incident. There was no impact in EU.
A third party dependency upgrade in a component which buffers events for internal broadcasting, caused a fraction of them to be dropped, which resulted in the corresponding webhooks not being sent. Investigation was unusually hard due to difference in impact across regions and being intermittent in nature.
10/2 17:38: change with the dependency upgrade was deployed to the CA region
10/2 18:19: change with the dependency upgrade was deployed to the US region
10/2 18:58: first instance of webhooks not delivered in the US region
10/2 20:38: first instance of webhooks not delivered in the CA region
11/2 18:27: incident was reported and investigation started
12/2 02:14: change was reverted in the CA region
12/2 02:48: change was reverted in the US region