Replies

Mar 1, 2015 11 years ago Official
Keith
is sweet
User Avatar
Eradication

Yeah, unfortunately there wasn't a setting for memcache and the admins who have access can only click "this thing is down", not make new services.

I'm also giving more non-technical staff members access to that panel and the ability to mark things, so that should help a lot going forward. Yesterday was absolutely a failure of the system :(

💖 ✨ 🤗

Mar 6, 2015 11 years ago Official
Amber
is bitter
User Avatar
Taco

Last night when I brought the site down, I tried to get a message up on the front page as detailed as possible and replied to people on FB/twitter until 6am when took over. I honestly forgot to update the status page (it's a pretty new thing we are using and this has been my first time being the main one on while things break) while replying to people and trying to get people online. Next time, I can def do a more updated twitter feed though last night (letshopethereisn'tanextime) it would've been the same "still calling, site still down" haha.

Mar 11, 2015 11 years ago Official
Carol
had too many
User Avatar

We're actually working on a bit of a hub on the back-end that will allow us to update "everything" (the actual "site is down" page, the technical status page, and our social media sites), in one fell swoop. This will help whoever is online at the time of trouble get the message out to as many mediums as possible.

Mar 27, 2015 11 years ago Official
Keith
is sweet
User Avatar
Eradication

So tonights status.subeta.net update was, again, my fault. The staff asked in our slack channel about the 404 pages and I responded saying "That's what happens when new servers spin up" (my same response as on tumblr) - then the problem ended up being something entirely different.

I tried to update the status site from my phone but unfortunately the dashboard wasn't really set up for mobile and couldn't submit. Instead I sent a status report and a fix to who resolved the problem and brought the 404 pages to an end by removing the faulty server from our loud balancer.

Some takeaways:

  1. Fix the status site for us updating from mobile, obviously.
  2. Change the algorithm for removing servers from the load balancer automatically. Right now it does that if it can't be reached by a server-to-server connection (SSH) or loaded at all in a browser. That doesn't account for 404/etc errors. I'm going to change that so if a 404 error happens, it removes it from the load balancer in the health check.

We're working on this, our goal is obviously complete communication and transparency. I realized I also don't have the Subeta twitter on my phone which I corrected tonight as well.

💖 ✨ 🤗

Please log in to reply to this topic.