Charlottezweb

Charlottezweb Hosting => Server Updates & Outages => Topic started by: Jason on March 08, 2009, 11:39:15 AM

Title: March 8, 2009 :: Power failure impacting two servers (Tempest / Tornado)
Post by: Jason on March 08, 2009, 11:39:15 AM
For approximately 15-20 minutes this morning around 3:05am EST, there was a power failure at the datacenter that impacted 2 of our servers.

Seeing as they've had a couple of power-related issues in the past couple of years I'm hoping to get more of an explanation but here's what they posted so far:

Quote
One of our 35 feeder panels tripped its main breaker. Appears there was a defective breaker in the panel that failed and caused the panel breaker to trip. All circuits except the one that caused the failure are back up after a minute. Power to the one that failed is being rerouted. We will examine it when the EE comes by on monday to see why a defective breaker would trip it out. It could have cuased a short in it. None of the customers who have B feed redundant power from us were affected by this issue.

A few customers had mysql databases corrupted due to the sudden shutdowns so if you notice any errors, I would first recommend running a repair on the database in cPanel under the Mysql icon.

I'm going to follow the discussion on the datacenter's site to see what further information I can get.
Title: Re: March 8, 2009 :: Power failure impacting two servers (Tempest / Tornado)
Post by: Jason on March 09, 2009, 08:57:51 PM
Here's an rfo report updated tonight by the datacenter

Quote
while moving a customers rack of servers a breaker in a panel went bad - it caused the entire panel main breaker to trip - why ? not sure - the electrical engineer speculates that the breaker was going to go bad and the draw of voltage from the new servers caused it to hasten its demise which caused a ground fault scenario on its connection. when this happens the main breaker will trip upstream of it to protect the rest of the system. The breaker that went bad also caused the ipmi cards in the servers attached to that breaker to fry - which is further evidence of a ground fault.

we meaured the loads on the panel and they are - 56 - 68 and 74 resptively and the 56% leg is the one the faulty breaker was on - it was at 69 before -so the panel is underloaded and the additional server load did not trip out the panel.

we asked how we can prevent this in the future and the technician told us we can not - its just one of those things that can happen from time to time since mechancial things can fail.

this breaker was 3 years old and well maintained - in fact it had just passed an infrared scan test from an outside firm 3 months ago.

In our assesment we did not do anything wrong on this and there is nothing procedurally for us to change.
Title: Re: March 8, 2009 :: Power failure impacting two servers (Tempest / Tornado)
Post by: Finner on March 10, 2009, 01:28:58 AM
Jeff and his power issues  8) 

imo gnax should have their power issues worked out by now...

or, they could just keep expanding  ::)
Title: Re: March 8, 2009 :: Power failure impacting two servers (Tempest / Tornado)
Post by: Jason on March 10, 2009, 07:52:54 PM
Ha!

Well, it's definitely dependent on your perspective I guess.  I can't fault them for staying competitive with the things they feel the market is seeking.  To overlook that would hurt their income which would ultimately impact all of us if they ever ended up shutting down.   In a time when companies are folding, it's nice to see they're staying fiscally sound and growing.

At the same time, some of the core things should be covered (power among them in my opinion).  From the report, this is one particular outage I can't fault them for although I disagree with some of their followup responses on buying dual power feeds.  In a "must be online" situation, sure, and that's great they offer it.  Without going into detail, I know they've gone to great lengths (especially after the last outage) to ensure stability there but as a consumer, it's very frustrating to see this happen.  (I'm not an electrical engineer but I went down there in October and have seen how robust their setup is -- at one point I'll be adding some details and pics to our site).  They did have it all independently audited a few months back after the last outage if I recall correctly.

I haven't made any decisions on Dallas yet.  For now I'll stay (and expand) out of Atlanta unless the reports dictate Dallas to be a better built DC.  From what I've seen of ATL, I don't think that will be necessary. 

(Along the expansion lines, they announced a new backup setup last week that's very exciting.  More to follow here in the next month as we look to take advantage of it!)