October 18, 2007 :: Thunderstorm outage

Started by Jason, October 18, 2007, 10:05:35 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jason

We've been working on Thunderstorm tonight as part of troubleshooting some ongoing update issues.  The server became unresponsive at one point and had to be rebooted.  We ran some scans upon bringing it back which caused it to take longer to come back online.

The datacenter is continuing their work with technicians from cPanel right now.

I'll post further updates as they're available.

Jason

Looks like we a have some good news from cPanel and we think our issues are resolved.  There's one more thing I'm going to test tomorrow but so far we've been seeing much better performance since fixing a problem with yum a few days ago.

Jason

As a followup on this thread....  We're still investigating some stability issues.  Now it looks like a suspected hardware problem on this box.

Presently there are less than 5 accounts on this server because I've been holding off moving the others until I'm confident the box is solid.  One of the sites on this server is Charlottezweb.com however.

We're going to take this box offline later this week (likely late night EST) to test the hardware.  It's hard to say how long the outage to the box will last but I'm assuming a minimum of 30 minutes to an hour and possibly longer depending on what the results do/don't show.

I will update this thread (and shoot all clients an email) advising of the outage when I schedule it just so you don't think Charlottezweb is disappearing for good.  :)

Please follow this thread for updates.

Jason

Update:  I have scheduled the maintenance window for Thunderstorm for tonight at 11pm EST.

I will send all clients an email today directing them to this thread and accouncement.

Note:  During this outage, Charlottezweb.com will be completely unreachable.  (That will include the site, Client Area, forum and this post, of course).  I want everyone to know now since it could potentially take a few hours if we need to check everything and I don't want people to worry that we're gone!   

In the event that Charlottezweb.com's site is down longer than a few hours, I will post updates to our old status site:  http://charlottezwebhelp.com/

Just so we're all clear, this is for the server Thunderstorm and it will only impact the 4-5 sites on that server -- one of which is Charlottezweb.com

Thanks

Jason

This outage is still on schedule to begin in the next few minutes. I will update this thread after we're back up with what the outcome is.

Thanks

Jason

We're back up as of approx 4:20am EST.  (A little under 5 and a half hours).  The hardware scans returned nothing abnormal.  The next step would be a forced fsck which could take anywhere from a few minutes to several hours or more to run.  The server would be down during that time and there's no possible way in advance to know how long would be required.

I'm going to look into things today (after some sleep) but I've opted not to run the fsck at this time based on how long it could potentially take.  Instead, I will likely look to replace this server with an entirely brand new one as soon as possible early next week.

I will reach out to the few clients on this box when I know more about when that will occur. 

Thanks