April 13, 2012 :: Emergency (planned) Jetstream hard drive replacement

Started by Jason, April 12, 2012, 08:57:00 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jason

Jetstream primary harddrive replacement
Friday, April 13, 10pm EST  11pm EST
-----------

(A link to this post will be sent to all customers with account(s) on our server "Jetstream" and posted on our Server Status site).

We've seen a few instances in the past 2-3 weeks where this server has become unresponsive during daily backups which has led to required reboots.  Multiple hardware scans today indicate that the primary (main) hard drive in this server may be close to failing. 

In situations like this where there's uncertainty around how long the drive will last, it's always best to attack it proactively on our own terms vs. waiting for it to eventually fail where we risk losing significant data. Because of this, I am planning to do an emergency replacement of that hard drive overnight tomorrow (Friday) starting at 10pm EST  11pm EST.  Given our global base of customers, I realize this isn't going to be ideal for everyone but my hope is that it offers the least impact to everyone given the time that will be required to complete this task.

The way the process will work is that we'll run server-wide backups manually prior to that time. (You do not need to run your own backups for this excercise though it's a practice everyone should do regularly for data security).  At that time, the datacenter will take the server offline to replace the hard drive with a new one, install the operating system, install cPanel and turn it back over to us. We'll then restore our custom configurations and then start restoring accounts from our backups.  This entire process can realistically take anywhere from 5 - 10 hours, possibly a bit longer if anything unexpected arises.  The longest tasks tend to be software setup-related where we can't impact the speeds.  The good news with this server is that it's one of our most powerful so we have that working in our favor.

We will have techs watching the server overnight tonight in case we experience a similar failure during tonight's backup.  In the event the drive experiences a complete failure between now and tomorrow night's scheduled maintenance, we will begin the process immediately.

If you have any questions, please post them here.   Thank you!

Regards,
Jason

------
edit:  fixed a couple typos.
edit 2:  Revised start time from 10pm EST to 11pm EST due to datacenter conflict at the earlier time

Jason

Please note, I just revised the time on this maintenance from 10pm EST to 11pm EST.

The datacenter has a conflict with the 10pm slot so we'll move it back 1 hour.

Thank you,
Jason

Jason

We will be taking the server offline to begin this maintenance momentarily.

I will keep this thread updated as we proceed.

Jason

Current status as of 12:30am EST Saturday --

The hard drive has been replaced.  The OS (operating system) has been installed and updated.  cPanel is now installing.  This step typically takes around 2 hours.

When that completes, we'll setup and secure the server and then begin restoring accounts.  That part of the process tends to take less time than the first parts so we may end up ahead of schedule if all goes well.


Jason

Update:  The cPanel install is complete.

We're working on re-securing the server and setting up our previous configurations.  Once complete, we'll begin restoring accounts. 

Thank you for your patience.

-Jason

Jason

Update:  The server settings are being wrapped up now.  If all goes well, we'll start restoring accounts within the next half-hour.

-Jason

Jason

Approximately 25% of the accounts are restored and online.   I hope we'll be done within the next 2-3 hours - ahead of schedule.

Regards,
Jason

Jason

All accounts finished restoring at approx 6:15am EST (about 45 minutes after my last post).

If you are having any issues, please let me know. 

Thank you for your patience!

Regards,
Jason