First of all, I want to say how sorry I am you guys are having to deal with all of these errors. We know it sucks, and trust me when I say we hate it just as much as, if not more than, you do.
The problem with this sort of thing is that it can take a lot of trial and error to actually figure out what is causing it. And we might think it's one thing, and tell you guys "We think we figured it out!" and be completely wrong. So we've been trying to be a little more conservative with that lately, despite the fact that we actually have thought we figured it out a few times.
We are working on a few different things that could be behind it, but they also might not be. The truth is that right now we don't know. We have made some educated guesses based on logs and patterns, but we don't know. And it's hard to come out and admit that we don't know what's wrong or how long it will take to fix it.
So many little tiny moving parts are altered in a server move. We know that something is different between the Rackspace setup we used to have and the AWS setup that has severely affected our reliability since moving. To be clear, I am not saying AWS is to blame, because they host some of the biggest sites on the internet without any problems. But there is something that changed between there and here that is the culprit, and even if we're not explicitly telling you about it, we're working hard to figure out what it is.
it is definitely something that can be fixed, because like we were fine on rackspace. that's not because rackspace is better than aws, but that there's something different in our aws configuration that doesn't play as nice with our code. we're also looking at ways of trying to optimize bottlenecks in the code itself, which is pretty important, too.. having a server configuration that works with our old, awful code is all well and good, but even with a good server configuration things will be so much smoother if we fix those bottlenecks.
it's just a matter of figuring out the bottlenecks and the parts of the configuration that aren't compatible (because let's be real assuming it's only one thing is a pipe dream).
I've added 'Staff Forum Posts' (aka the admin post feed) under "Check Out" on the news page. :) Hopefully, people will start keeping up with the posts, maybe checking them once a week or something.
As we posted last month when we brought attention to the Admin Posts Feed, it’s a good resource for people to see posts we (staff) have made. Posts in the feed are usually about issues in P&B, helping users out, etc. I put it on the “Check Out” section of the news because made a good point that not everyone uses the forums, so of course, those users are less likely to use the drop down menu on the forums interact navigation to find admin posts or go to the main page of the forums for the admin posts. I don’t want those users to miss out on questions I or other staff members have answered because they didn’t know about the Admin Feed. :)
We are looking into a status page though for when the site is down/having issues, but that's not really what I was trying to address with the admin post thing.