PSX Excessive lately suffered from a relatively extreme server failure, which took our web site offline between December thirtieth, and January sixth. This was the longest unplanned outage in our 24 12 months historical past.
So, what occurred?
To attempt to simplify a relatively lengthy story, the database server that powers PSX Excessive malfunctioned. Whereas we had tried to restore the database server quite a few instances, our makes an attempt at repairing had been unsuccessful. In truth, we really ended up making issues worse. The database that powers our web site, grew to become irreversibly corrupted.
Our solely actual resolution at this level, was to fully wipe our server clear, and reinstall all the pieces from the bottom up. On paper, this could have been a simple factor to do. Reinstall the working system, reconfigure our management panel. Simple. Time consuming, little question. However most positively a simple job.
Besides, completely nothing had gone appropriately.
Downloading The Backups
PSX Excessive has 4 main and fully totally different backup strategies. Every methodology is meant for use for a distinct sort of {hardware} or software program failure. As an illustration, we again our database, posts, pages, and first directories, as much as the cloud as soon as each 24 hours. This methodology of backup is nice for when we have to shortly revert again a day or two. The downsides? It doesn’t again the complete web site and all directories up, however relatively, it’ll solely again up what’s required to maintain the core of our web site operational. In different phrases, absolutely the fundamentals.
We additionally create a full backup of our complete web site, and all directories inside our predominant internet folder. This methodology of backup is an actual reproduction of our web site because it appeared on the date the backup was created. Sadly, we solely run this clone-based backup methodology as soon as each seven days, which for a high-content web site like PSX Excessive, shouldn’t be essentially the most very best of options. Nevertheless, it’s a fallback that’s almost assured to work.
That can be the backup methodology we opted to make use of.
The precise act of downloading the backups from a server, and storing them on our native drive, took roughly 36 hours. PSX Excessive is a big web site, and comprises over 400GB of complete knowledge.
Simple, however time consuming.
Restoring The Backups
Sadly, that is the place issues began to take a flip for the more severe. Whereas the act of downloading the precise backup information wasn’t overly sophisticated, simply time consuming. The identical can’t be mentioned for the restoration course of.
We needed to add the compressed backup information to the server, after which run a restore command. Sadly, each single time that we tried to try this, the restore course of failed. We tried to do that a number of instances, losing roughly three days. Every time, the backup would get to about 95%, after which dangle for a number of hours, earlier than finally failing. Since we needed to restore a relatively giant file, having the restore course of dangle was regular and anticipated. Having it crash? Not as regular or anticipated.
As soon as we acquired the location restored, we tried to revive certainly one of our cloud backups, to get as near our earlier live-site as we might. Sadly, restoring the cloud backup ended up corrupting our database, requiring that we wipe the database and reinstall the unique backup once more. Every time we needed to do a brand new restore, we must sit and babysit the restoration course of for a whopping 4 hours.
So now, an extra eight hours have been wasted on simply attempting to revive a working backup. However lastly, it was accomplished. Issues had been now not crashing. All was good on the planet!
And Now We’re Right here
PSX Excessive is again on-line. Issues usually are not totally secure fairly but, however on the very least, we’re practical. We are able to as soon as once more contribute content material to our web site, and all core performance is sweet to go.
And but, issues are nonetheless reasonably unstable. We’re gradual, and have a number of visible bugs and glitches which have but to be mounted, as of this writing. However at the very least we’re again on-line, proper?
I wish to thank everybody in your persistence. Restoring PSX Excessive was no simple job, even when it was imagined to be a simple job on paper.
Preventative Measures
To attempt to be sure that this by no means occurs once more, we have now carried out a brand new caching methodology into our web site, which ought to pace issues up relatively considerably. Past that, we’re additionally going to be creating full cloned copies of the complete public listing each 24 hours, to kind of match our cloud-based backup providers.
We’re additionally going to rely rather a lot much less on distant cloud backups, contemplating as how these haven’t, thus-far, been of any actual worth. This was imagined to be our most safe, and most dependable, methodology of backup and restoration. However as an alternative, it grew to become the least dependable of the bunch.
We may even be trying into the opportunity of internet hosting our web site on a distinct internet hosting community. Proper now, we run our personal servers, and kind of present and do all the pieces ourselves. That is fantastic for when it really works, however as we simply found, is an actual ache within the ass for when issues hit the proverbial fan.
All in all, we’re again on-line. Hopefully for good this time round.
Associated