August 6, 2012 :: Tsunami issues for some customers

Started by Jason, August 06, 2012, 06:46:39 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Jason

I wanted to post on an issue we've been working on since this morning.

I had a few customers report overnight that their sites were unreachable off and on.  I had a few more that reported it this morning as well.

In looking at our uptime monitoring, there was/is definitely something going on although it's sporadic and appears to only be impacting certain geographic regions.

Our uptime site allows you to click on individual servers and view all the disruptions that are tracked.  Pingdom checks every server every single minute of the day so the data is pretty nice.  I can also pull tracerts and other details for each time it has a connection issue.

The pattern of "disruptions" seems regular on Pingdom which would lead me to believe that as Pingdom cycles through its global servers to run checks that the same servers are likely reporting the same disruption each time.

Throughout the day we've done extensive testing and we've ruled out quite a few things.  In fact, we rebooted this server to see if that would help since prior to the reboot it had been online without a restart in 420 days.  :)

The current status is that the datacenter is working to isolate the cause. 

If you are one of the impacted customers and would like to have your account moved to one of our other servers tonight, please let me know. 

Thank you for your patience.  I'll keep this updated once I have something further to add.

Regards,
Jason

Pam

Thanks Jason for the thread and the update.

We (TFF) are good to remain where we are, but thanks a lot for the offer. :)

SnapHappy

Thanks for the update- I am still not able to access cpanel and am having issues receiving mail.

SnapHappy

Actually I am able to access my site fine. Rick is still not able to access cpanel and is having an issue receiving mail.

Jason

Quote from: SnapHappy on August 06, 2012, 07:41:06 PM
Actually I am able to access my site fine. Rick is still not able to access cpanel and is having an issue receiving mail.

Thanks.  The issue is definitely not resolved yet so once I know more I will post it here.

Regards,
Jason

Jason

Quick update - the server is completely unplugged for the moment while they test some connectivity into it.

All services on this server for everyone are presently unavailable temporarily.

We're hoping this will only be needed for the next 10 - 15 minutes.

bennettpr

Thanks for keeping us posted Jason - ignore my support ticket then :)

Jason

Thanks. If the outage goes another 10 minute without overall progress we'll plug it back in and go from there.

Jason

It's coming back up now.  Another few minutes and those of you who weren't previously impacted should be good again.  As to the larger issue, I should have more on that once the tech can follow up with me.

bennettpr

Thanks Jason,

Keen to hear details as they come to hand.

Pam

Quote from: bennettpr on August 06, 2012, 09:02:56 PM
Thanks Jason,

Keen to hear details as they come to hand.

Ditto . . . hoping for some good news.

Jason

Update -- things have been stable the past 40 minutes approx but some strange activity on the network card is leading us to believe it's starting to fail.  We're doing an emergency swap on this now.   This will involve approx 20 - 30 minutes of full outage.

I will report here once this is complete and we can monitor further.

Jason

The NIC swap is complete. 

Sites are loading near instantly again for me.

Is anyone still seeing issues?

Jason

Pingdom has not detected any further disruptions since replacing the network card.

If you see anything unexpected, please let me know.  Thanks!

Pam