Charlottezweb

Charlottezweb Hosting => Server Updates & Outages => Topic started by: Jason on January 31, 2006, 06:17:57 PM

Title: Jan 31, 2006 :: Cyclone issue
Post by: Jason on January 31, 2006, 06:17:57 PM
I'm working on a Cyclone issue right now. 

I will update this post shortly with more info as I get it.

Regards,
Jason
Title: Re: Jan 31, 2006 :: Cyclone issue
Post by: Jason on January 31, 2006, 06:50:12 PM
No updates at this time.  I'm in communication with hands-on support at the datacenter.  The server was unresponsive at the time.  They're still working. 

Updates will be here as soon as I have them.

Thank you for your patience.

-Jason
Title: Re: Jan 31, 2006 :: Cyclone issue
Post by: Jason on January 31, 2006, 06:53:56 PM
Service is restored...

More info to follow.
Title: Re: Jan 31, 2006 :: Cyclone issue
Post by: Jason on January 31, 2006, 07:59:16 PM
Here's an explanation of what occurred this evening.

I received a page/call from Alertra that httpd was down.  I immediately checked and my support had already opened an "unresponsive server" alert.  (They beat me to it in a less than 3 minute outage). 

In this case, the server was completely unresponsive that required a hands-on reboot.  All of that was accomplished within approx 25 mins (a little slow due to some other circumstances). 

However, this is our newest server and an incorrect network gateway setting kept it off the Internet even after it was rebooted and up and running.  That added some additional time for the outage that should've never occurred.  This has been corrected to prevent further issues.

The type of failure that we experienced is generally representative of a hardware issue in many cases.  I had a memory test run which failed -- not a good sign.

So, we will be replacing the memory within the next 30 minutes.  I made the decision to do this now in a controlled setting, rather than to schedule it for a more convenient time.  Doing so risks further outages in the meantime which could cause software problems that we don't need.

Therefore, the server will go offline -- hopefully for less than 15 minutes -- within the next short while to have new memory installed.

This should greatly reduce the chances of a memory-related failure in the future.

Thank you for your patience.  I will update this thread once complete.
Title: Re: Jan 31, 2006 :: Cyclone issue
Post by: Jason on January 31, 2006, 08:58:36 PM
Memory swap is complete and the server is running normally again.

The swap took just under 9 mins.

We then re-ran our memory test (which brought loads up during its progress) and it passed successfully. 

We'll continue to monitor possible hardware issues if this re-occurs, however, we believe bad memory to be the culprit at this time.

Regards,
Jason