Our maintenance is complete and the databases and Market sites are working correctly again. I want to apologise for the maintenance running over the expected window, we hit a couple of hiccups and I wanted to let you know what they were and what we’ve done to deal with them.
Firstly we don’t do this particularly often, so the processes we follow to shutdown all the accesses to the database were not as slick as they could have been. We have begun the process to improve these.
Secondly when we had finished our maintenance and initially restored access to the website, around 45 minutes into the maintenance window, we suffered a problem that in hindsight was predictable, the thundering herd problem. Lots of pent up requests all flooded through simultaneously, so the database was dealing with a lot more traffic (about 5x) than normal. Compounding this problem was the fact that the database caches were cold, so queries that normally took less than a second were taking 10-15seconds. Thankfully the new database version we are running has settings that help deal with this problem, so in the future it will be less of a problem. We will also modify our restart processes so that we let in a little bit of traffic at a time and slowly increase as caches warm up.
Finally, we were pretty optimistic about how long this would take. We will try and be less so next time and set a better expectation about how long things will take.
Sorry once again and thanks for your patience.