April 27, 2014 :: Scheduled Avalanche maintenance, two brief outages expected

Started by Jason, April 26, 2014, 04:01:13 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jason

For customers on our server Avalanche, a very extended outage occurred yesterday due to a combination of multiple hard drive and RAID controller card failure on the parent server.   The full details (including all the updates I posted throughout the day) and the datacenter's write-up of what occurred are available in this thread:

http://www.charlottezweb.com/forums/index.php?topic=1968.0

As a result of that issue yesterday, the datacenter would like to move us to a more stable parent server to ensure we don't experience anything like this again.

Rather than waiting for the business week to arrive, I've scheduled this to start at 4am Eastern tomorrow (Sunday) morning.

We've had to do this in the past on a few of our other servers.  It's a normal process they do when they need to upgrade or perform (non-emergency) maintenance on a parent.  This involves an initial brief outage (usually around 15 minutes) and then the server is back up and online while data transfers in the background to the new parent.  The transfer can take anywhere from a few hours to 10+.  During this time there's typically no performance issue experienced and sites will be online.  When the transfer is complete, a second outage occurs to sync up and finalize the move.

I will be sending an email notification to all customers with active accounts on Avalanche within the next 15 minutes.

If you have any questions, please post them here.

Thank you,
Jason




Jason

This maintenance started as scheduled at 4am Eastern.  Pingdom monitored 3 minutes of outage as the server was rebooted to initiate the transfer.

It's running about 10% an hour (we were at 31% completion around 6:45am) so we'll likely complete at some point late afternoon/early evening Eastern time.

I will keep this thread updated.

Thanks,
Jason


Jason

I received an update that it's nearing completion -- quite ahead of schedule.

Looks like it's down for the final reboot/resync.

I will update this thread once complete.

Jason

The move completed and now the server is running a file system check as part of the final step.

This is taking awhile to complete but the datacenter is monitoring and we've had no errors so far.

It should wrap up shortly.

Jason