Incident report for degraded service on Envato Elements on 4 May 2018

On Friday, 4 May 2018, we had an outage for 2 hours and 14 minutes between 9:42 AM and 11:56 AM (AEST) (23:42 to 01:56 UTC). During this time, the browse and search functionality on Envato Elements degraded and became unavailable.

We take the availability and performance of our service seriously, and we are deeply sorry for the degraded service.

What happened

The upstream provider for our search services had an issue with their proxy layer, which caused degraded connections to clusters. This meant that connections to our search cluster were failing and we could no longer fulfil any requests that fetched items.

During this window, a small number of connections were successful. Additionally, there was a layer of caching that meant some common queries did still return search results. While we’re glad to have been able to serve some requests successfully, the issue highlighted a critical point of failure in our system. We’re now able to learn from this incident and ensure this part of our system is more resilient and robust in future.

What we’re doing to avoid this happening again

We are implementing strategies to ensure that we have a failover option if an outage like this occurs again. This means that if our primary cluster goes down, we’ll have a backup available in another region that we can switch over to instantaneously.

We are also consulting with our upstream provider on the specific details of the incident to find the best way to manage this type of problem, and to increase resilience of our hosted infrastructure.

We are fiercely committed to our mission of empowering creative people — when our community succeeds, we succeed — so we apologise again unreservedly for the outage.

Keep creating amazing things.

Sent with :green_heart: from the Envato Elements team.

4 Likes