Some time back: A dedicated server I maintain at Webfusion had problems rebooting. No big issue I thought – Webfusion have 24 hour support. What I didn’t realise is that it’s a very limited support.

When I called I was informed that while I could request a reboot – there was nothing else that could be done with the machine until the next work day. I guess they have a trained monkey that presses the reset button and that’s all. The annoying thing was – this had happened once before during the day. It was back so quickly that I’m sure only a key needed to be pressed.
3 Dec 2008: I receive notification that the servers are being moved to a new data centre. I have just one week to make sure my customer can live without their website as they are switching off the server for up to 9 hours. The mail does say that it will only be off for some of the time – maybe it won’t be so bad. Christ! What would have happened if I had been on holidays?
Tuesday 9 Dec 2008: I remembered the problems that I previously had and made a point of calling them to say there might be an issue when the system is restarted and they just need to do whatever they did the last 2 times. I was told – don’t worry, as soon as the technicians see that it does not respond to pings they will fix it. Looking back now I should have forced them to create a ticket or called again shortly before they turn it off.
Wednesday 10 Dec 2008: The server goes down as expected. I stay at my desk to wait for it to come back online so that the Asian users of the site will have access as soon as possible.
Thursday 11 Dec 2008:
3am: I start to get worried that something is wrong and decide to get some sleep in case I need to be alert the next day.
5am: Wake up and check the status. Still down. The firewall (a separate Cisco device) is online so I know they have done the move.
6am: Wake up and check the status. Still down
7am: The server is officially due back online but there is no sign of life. I call them and am informed that they will work on it. I ask – what do you mean “will”. I had been told that this would happen automatically. No – a ticket has to be created. I’m still calm. I’m a saint.
8am: No news. I call. I ask explicitly – is there a bigger issue or a backlog that will cause a delay. No – They’re working on it and it will be back soon. I inform my customer that the server will be back soon – they inform their customers.
8:30am -> All day: Repeated calls by me. Repeated answers of “we are working on it”. I’m promised on 3 separate calls that they will email me an update – they are never sent. I’m completely blocked in my attempts to get any sort of info. I’m told there is no way to get the issue prioritised. I remind them that based on previous issues it should be something small. I ask again – is someone really working on it. I feel like an idiot having to keep asking.
Luckily I had the online backup (Note: I own Shercom) with all the config and files. After discussion with the customer we decided to setup an emergency site so their customers are not blocked. The basic site is up and running quickly but there are 8 gigabytes of additional files (about 3 million of them) to be restored. This takes a while.
Evening Time: Finally – after over 10 hours of asking I get a call from the Webfusion Sales Team (not support). They can’t get the system started and they will have to give me a replacement system and attach the old disk so the “files can be transferred”. Being a salesman he words it like I should consider myself lucky since the new system is faster and has more memory etc. He does apologise – for which I’m grateful. He also says I’m going to get the months fees credited. I’m also grateful for that. He then says they had 20 systems that would not restart. Now I’m pissed. I stay polite but give him a bit of a bollocking – they should have told me at the beginning that there were bigger issues. I was explicitly told at the beginning of the day that the move went well.
Like an idiot I believe him when he confirms that they will copy the files to a backup folder. At least that way I can configure the new server in the morning. I was wrong again.
Friday 12 Dec 2008: I get a call in the morning to tell me that they need my credit card details before they can start to setup the machine. Grrr. On top of that it will now be my job to copy the files across.I’m told I will receive an automated mail but I’m not to use it until the sales guy contacts me. I make a point of saying that it’s not to be a new contract but rather an extension of the old one.
Saturday 13 Dec 2008: Still no news on the availability of the new server. The killer is – I really wonder did they try and fix the old server? I could easily imagine that when it didn’t start it was just written off.
An email arrives later in the day. My initial hope is wasted: It’s an invoice for money that I was promised I would not have been charged AND it says the contract is for 12 months. I feel my saint status slipping away.
At least there is some light at the end of the tunnel – the emergency server is working well.
Monday 15 Dec 2008: I got a call at the end of the day to say that the caddy for the old hard disk is in use with another customer. On the bright side – at least I got the call. I was also told the invoice will be credited back – the one I received is part of the automated system. That I can understand.
Tuesday 16 Dec 2008: At 11:55 I received an email to say the server was connected. Now, after over 5 days of downtime, I can start to copy files and configure the server. At this stage it’s probably too late – I and my customer have lost confidence in the service and the emergency server will most likely become permanent.
Conclusions
- The value of good customer service can’t be overestimated. I’m not asking for a dedicated engineer here – but I should be kept up to date as to the status.
- System’s crash – you must have backups of the files AND the configuration settings
- If you rely on your website – you need to have a disaster plan in place.
- If you really really rely on your servers then you may need co-location services. This is what Shercom does for the Online Backups. If any of our systems goes down we can get our hands on it and swap out hardware. We’re not waiting for others.
Sometime in the near future I’ll put together a post reviewing the hosting companies I know personally and preparing tips on how to select the right one for you.