Dec 6, 2007: Trouble reaching your site today?

Started by Jason, December 06, 2007, 02:52:54 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jason

If you're having problems reaching your site(s) today, please open a ticket and provide me a tracert to your domain name.

I've received two reports of this so far today and found a thread discussing the same thing on the datacenter's support forum. It's not a server problem -- it's network related -- and a tracert will help us pass on direction to the dc.

I can't see any outages but it looks like some incoming routes may be affected.

Thanks

Jason

Note:  I'm starting to get more reports of this from customers overseas.

The datacenter is actively looking into this and I hope to see a resolution quickly.

Jason

From the dc:

-emergency router maintenance-

Quotethis is cisco related - we are going to go ahead and replace this supervisor card with an upgraded card we have ready to go.

we are going to replace our other router card tomorrow with an upgraded version.

this upgrade will happen in the next few hours and there will be some limited outages that are intermittent.

it will only be during the time that the card is swapped and the routers get their tables back.

Finner

This is only affecting some members

There are 2 members so far that can't reach the site

Is it because of this?

Jason

I would say it depends on their geographic location but it's quite possible.  Are they in Europe by chance?

Redeye

Whatever they've done overnight seems to have fixed the problem.
:: Ride Safe - Live Long ::

Jason

Excellent.

The fix went in at 12:11am EST last night (about 8 hours ago). 

They swapped the supervisor card for a replacement.  That didn't cause a total loss but some parts of the network were unreachable for a few minutes. 

Our server Firestorm was reported down by Alertra because of this for abt 10 minutes.  (It was not down, merely unreachable).

From the dc:

Quote
this is done and back up/

all looks good.

the route science box will take over night to rebuild its metrics and then tomorrow afternoon everything should be smoking again.

we are going to try to do an upgrade on the other cores over the next few days as well.

I will keep you posted if we do.

So I would say that if you were impacted yesterday you should be back up though if it's not as fast as normal, please give it awhile today.  Route Science is the system that evaluates incoming and outgoing traffic and determines the fastest routes.  It will take awhile to rebuild its tables -- in theory, the more traffic it handles, the faster the routing gets.

Regards

Jason

I received a report that Firestorm is unreachable again right now and I'm unable to get to the datacenter's site so my assumption is that they're doing another update or something else is wrong.  I will update this thread as I learn more.

Jason

Quick update:  I called the dc just now (just to make sure) and they're aware they're unreachable and have all hands on deck to resolve.  I'm only showing Firestorm presently unreachable for us.

Jason

Firestorm is reachable again (as is the dc). 

I'll post an update here when they post the cause/resolution.

Jason

The dc just updated their thread: 

QuoteWe will be upgrading another gold router in approximately 1 hour. This should smooth over remaining issues. The majority of our upstreams will stay online but there may be some short scattered outages while this is done.

It's possible we may see a little more activity during this process.

Jason

From the dc:

Quote
This was done and everything is back up, things look good. It will take a little while for the route science metrics to rebuild on the upstreams that were affected.

We're still watching for any issues so if you have any problems please open a ticket for us with a traceroute and we'll look into it right away.

Thanks

Jason

Further work is scheduled for later tonight (Monday morning EST) to fully resolve what is still happening for some inbound routes.

From the dc:

Quote
We will be performing network maintenance on Sunday, Dec 10th at 3:00AM EST on routers on the silver network. This will cause limited outages and possibly some intermittent connection loss on the following upstreams:

PCCWBTN (silver), Telia (2) (silver) - expected duration ~ 30 minutes
Cogent (silver), Telia (silver) - expected duration ~ 30 minutes

Jason

Hopefully the final update on this:  Their maintenance was completed early this morning.

We should all be set for now.  If you continue to see any issues, please open a support ticket and provide me a tracert to your domain so I can open individual tickets with the datacenter for research.

Thanks.