Server "Wildfire" has had repeated service disruptions over the past 1-2 weeks where it's become unresponsive after memory spikes.
For the majority of the day when everything is performing properly the server is functioning incredibly well but it's not doing a great job of handling sporadic load spikes. Due to this, I will be upgrading the server tomorrow morning to hopefully provide some more breathing room.
This is one of the advantages to our present platform -- I can upgrade rather easily vs. a physical server migration.
What to expect:
1. When I start the migration, Wildfire will experience the equivalent of a reboot -- maybe 15-30 (max) of service interuption.
2. After the initial outage, the service will be up like normal while all data from the server transfers from the existing server to the new one in the background. This part can take anywhere from a few hours to 10-12+ hours.
3. Once all data has transferred, the server will experience a second/final outage where it essentially restarts again. This sometimes takes longer but is usually in the 15-30 minute time frame.
Once complete, that's it. All data is now on the new server and running live without any action from you or Charlottezweb.
I will update this post once initiated and as it proceeds tomorrow.
Thanks,
Jason
I am starting the server resize now.
The server resize failed during the server restart.
I'm on the phone with the datacenter now.
I will keep this thread updated with details as they are learned.
Thank you,
Jason
The current status is that the parent that Wildfire is on needs some configuration changes in order to handle the resize.
As of now, the datacenter is going to focus on the best approach to get Wildfire up and running as quickly as possible. If they can do this while continuing with the resize, we'll do that. Otherwise, they may cancel the resize and prioritize restoring service to Wildfire first. Hopefully I'll know the answer to that shortly.
Thank you
Service was restored at 6:50am.
The resize is still being investigated. Once they have confirmed that the root cause has been addressed, we'll look to reschedule that -- potentially overnight tonight.
I will keep this thread updated.
This server upgrade/resize is currently on hold.
The datacenter is dealing with a large-scale network upgrade (related to the emergency maintenance which impacted Thunderbolt this week) which is causing them to lock down some of their server upgrade functions. I'm seeking clarification as to whether there's a workaround to complete this sooner.
As soon as we're given the go-ahead, we'll reschedule this upgrade.
Thank you,
Jason