Charlottezweb

Charlottezweb Hosting => Server Updates & Outages => Topic started by: Jason on September 01, 2007, 08:38:09 AM

Title: 09/01/2007 :: DNS service on Blizzard down
Post by: Jason on September 01, 2007, 08:38:09 AM
We're working on Blizzard right now.  The BIND (dns) service on that server started failing last night (looks like about an hour after the datacenter outage (http://www.charlottezweb.com/forums/index.php?topic=811.0)) and has been trying to restart itself since.

The server is up (hence Alertra isn't reporting a problem), but without Bind, sites aren't reachable. 

I will update this thread asap.
Title: Re: 09/01/2007 :: DNS service on Blizzard down
Post by: Jason on September 01, 2007, 09:36:00 AM
Update:  We're still working on Bind.  It's going to require some repair which we're running now.
Title: Re: 09/01/2007 :: DNS service on Blizzard down
Post by: Jason on September 01, 2007, 09:52:52 AM
Service has been restored.  I will update this thread later today with what I hear back.  I have a cause for last night's dc outage that I'm unhappy with and am waiting for a follow up reply from the owner. 

In the meantime, I've contacted our monitoring service, Alertra, and am considering an upgrade to our monitoring to include individual services instead of just an http check.  I will update you on that as well.
Title: Re: 09/01/2007 :: DNS service on Blizzard down
Post by: Jason on September 01, 2007, 06:52:08 PM
Follow up:  I will update the network outage thread with more details as well but in terms of what happened to Blizzard, the Bind (DNS) service became corrupted shortly after the network outage last night.  When that happened, it attempted to restart the service and it wasn't able to recover from there.  We rebuilt Bind this morning and restored all service.

The problem in terms of response time arises because Bind is not a service we actively monitor presently.  Alertra (our monitoring company) monitors on the server being up and the hostname page being reachable.  This never went down last night so I wasn't alerted to the second outage on this box.  Thus it went several hours before being noticed.

I've contacted Alertra today about upgrading our monitoring and will post a separate thread if/when we go that route.
Title: Re: 09/01/2007 :: DNS service on Blizzard down
Post by: Jason on September 04, 2007, 06:13:06 PM
Quick follow-up:  I'm still continuing discussion with Alertra and will *definitely* be adding multiple-service monitoring to all servers as soon as possible -- I'm negotiating the setup of a custom test script now. 

At a minimum, I will be adding monitoring for MySQL and Bind (DNS) to ensure that should one of these services go down on your server, I will be notified via pager/cell quickly.  As always, a server outage is monitored 24x7x365 by my support staff and is addressed whether I'm available or not.  That coverage will not change.

Thanks,
Jason
Title: Re: 09/01/2007 :: DNS service on Blizzard down
Post by: Finner on September 05, 2007, 12:58:16 AM
thanks J