Hey guys, as our systems start to come back online, you might notice a few bumps here or there, as we start bringing everything back online.
What happened, is our memcached stack failed miserably, but in a really, incredibly sneaky way. From all of our tools and monitoring services, it appeared that memcache was still online and working, just slower than usual, which happens when we have a lot of users doing things that require caching (such as liking avatars on the last day of an event!)
However, what was secretly happening is memcache was "holding" connections, and a server can only have a certain number of open connections running at a time. This includes apache (our web server!), and other services that we run to make Subeta run as smoothly (cough) as possible.
This meant that our web server (and even memcache, trying to steal more connections!) couldn't make the connections to the database, or the other services, and serve pages to you guys, the end user. We couldn't see what was causing that problem, but now we've established that it is memcache and have (hopefully) fixed the problem, and are now in the process of bringing everything back online.
THANK YOU for being so patient with us today, I know that it can be stressful trying to get around the site while something like this is happening, but we really appreciate you sticking with us.
Update: For those of you interested, here is a link to what the graph looked like, when things got seriously bad yikes!
It's still laging for me =/