Every two weeks I send out a newsletter containing lots of interesting stuff for the modern PHP developer. You can expect quick tips, links to interesting tutorials, opinions and packages. Want to learn the cool stuff? Then sign up now!

Today DigitalOcean lost our entire server

This morning I got a mail from the support department of DigitalOcean, which hosts most sites for my company Spatie. (If you’re not familiar with DO-speak, “droplet” is just a synonym for “server”)

… I’m reaching out regarding your droplet. Earlier today, our Cloud Operations team was alerted to some performance issues affecting the physical server that hosts your droplet and immediately began investigating. Unfortunately, despite their recovery efforts and a filesystem check of the underlying disks, the damage was serious enough that this droplet was lost and not able to be restored.

While our hypervisors are all fully redundant with RAID arrays, we do not additionally backup customer data (unless the user has enabled backups for the droplet, or taken snapshots of their own) for several reasons. One of the main reasons is data privacy; for this reason, it’s expected that each customer will maintain the backup solution that works for their needs and specific situation.

If you did not have a backup or snapshot of the droplet, I’m sorry to let you know that we are not able to recover your data. The droplet’s ID and IP address have been saved, so you can rebuild the droplet if you’d like to keep that same IP address (avoiding any DNS changes), or simply destroy the droplet and create a new one.

We apologize for this situation; it’s obviously a difficult place to be in, and it’s not one that we take lightly, or one without having first tried any recovery methods available to us, before having to give you this bad news. We have gone ahead and granted you credit covering three months of this droplet’s run rate. We understand this doesn’t bring your data back, but we hope it helps as you move forward.

Obviously, this not a mail you ever want to get. Luckily, we made the decision at Spatie to host every site on it’s own droplet, so only one site was affected.

When visiting that website it was indeed down. Against my better judgment I tried ssh-ing into the droplet, which of course also failed. So there you have it: one day the droplet is just running fine, the next day it’s gone. All data lost. Poof!

A few minutes after the mail above I received another message from DigitalOcean.

Screen Shot 2016-02-03 at 22.55.53

Booyah, indeed. Fifteen dollars is peanuts when you take into account that an entire server has just vanished.

DigitalOcean has a paid backup service that takes weekly snapshots of all droplets. All our droplets are using that service. Now was the time to test it out. After issuing the command restoring the snapshot took about 10 minutes. When the job was finished the server was running again. It had the same IP address as before the crash and Forge (which we use to provision/manage droplets) could establish a connection again. Unfortunately the snapshot was 7 days old so all data from the past week was lost. Our client would not have been happy to hear this.

We’re fairly paranoid when it comes to backups and never wanted to put all our eggs in one basket. In addition to the weekly snapshots taken by DO’s backup service, all droplets are copied daily by BackupPC to storage at Amazon. I copied over the files and database dump from that backup to the restored droplet. The result was that, in less than an hour, the site of our client was up again without data loss. Crisis averted.

In the afternoon I got a full explanation from DO why the droplet crashed:

Screen Shot 2016-02-03 at 23.31.14

Think about this for a minute: what are you going to do when one of your servers would disappear right now? I hope you take away from this story that you should always backup your servers. A hardware failure can happen at any given moment.  Do not solely rely on backups from your own provider. Take your own backups as well. Use tools like BackupPC, Bacula, or a service like ottomatik.io.

If you’re into Laravel you can also use Spatie’s backup package which can dump your database and copy it together all your files to multiple destinations (S3, SFTP, Dropbox, …).

Freek Van der Herten is a partner and developer at Spatie, an Antwerp based company that specializes in creating web apps with Laravel. After hours he writes about modern PHP and Laravel on this blog. When not coding he’s probably rehearsing with his kraut rock band. He loves waffles and butterflies.
  • seenu

    wow, still you are in good position because of backups.

    after reading everything…i wonder….can a $5 droplet runs sites properly?

    I am always using minimum $20 droplet

    • For smaller sites a $5 droplet is ok. Just add a bit a swap and you’re golden.

      • Also something Seenu may not know, is that you get 4 vCPU’s for $20 whilst I think he might get 2, the same RAM and 4CPU’s for 4 distinct sites is a better way of doing things. What would stop you from balancing all sites though so that down-time was not a thing whilst you got the one damaged node back up?

  • First I was like OMG! How is this possible? Then I was like ok $5 / month is very cheap, is price dumping causing such issues?
    Thanks a lot for sharing. Backups are so important these days!

    • We have quite a numbers droplets of that price point running for a few years now without any troubles. I guess these problems could happen to any droplet, regardless of the size/price.

      • Moazam

        $15 credit is a joke! Imagine if you have sales orders which are not backed up. The way DO handled situation made me surprise. I will avoid DO service in future.

        • My honest opinion is that this can happen with any provider. Don’t rely on your hoster for backups but take matters into your own hands.

          • Moazam

            If my needs are fulfilled in $5 I have no reason to pay $20. I know backups are important but server failure and no way to recover data is disaster and it’s completely hosting company’s responsibility.

            One my server hard drive died and I was able to get data from other drive. If DO drives are not RAID protected then I am sorry to say that you are sinking $5/$10 every month. I suggest to go for $2/m hosting which is far better and secure.

          • DO hypervisors all have RAID arrays. The issue here was a hardware RAID card failure on the hypervisor.

          • kellyzdude

            Presumably the RAID card in question was not replaceable, or stored its data only on boatrd. My experience with older 3Ware cards was that they stored array data both on the card and on the disks such that in the event the RAID card failed, a similar card could be installed and the RAID data imported from the drives. Other RAID cards… not so much.

            I miss 3Ware..

          • kellyzdude

            Disaster can always happen. It’s not the responsibility of the provider to keep and maintain current backups, virtually every provider will list it in their ToS that even if they do keep backups, these can also be lost — the responsibility for backing up data remains with the customer.

            And realistically, if you’re paying $5/mo, you can probably afford a similar service with a different provider and back each one up to the other. If you’re not putting the effort into backing it up, it’s clearly not important enough.

          • lazy, childish thinking like this is so annoying.

            From the article:
            > While our hypervisors are all fully redundant with RAID arrays, we do not additionally backup customer data (unless the user has enabled backups for the droplet, or taken snapshots of their own)

            they have RAID, but RAID does not prevent everything. For additional backup put in place your own process, use their backup service and put your hand in your pocket when you have to, so that these issues can be avoided, they are a hosting company not a day-care for children with cheap parents

        • kellyzdude

          While the VM being entirely lost warrants a little more than a simple SLA credit, but $15 is also 300% of the monthly fee.

          What would you expect the service provider to do? See point 9 of their Terms/Legal page — like every other provider, you’re expected to keep backups. If you don’t, you’re expected to own that decision also.

          I think a 300% credit on your monthly rate is admirable, albeit a slight punch in the face when taken at its raw value.

          • That last sentence sums up my thoughts exactly.

            I was more annoyed by the unfortunate “Booyah” in the mail than the measly $15.

          • The subject was unfortunate. This is the default for credit grants (which are more often given out in more positive circumstances). This blog post brought this problem to light and the emails are being re-worked.

          • Good to hear that! I hope I will never have to receive those re-worked mails 😛

        • Eoin Prout

          All VM hosts are very clear that a virtual machine can disappear at anytime and that you should be prepared.

  • Mohamed Said

    We had a similar issue back in the days and we wasn’t making backups so we end up punching our self in the face and blaming each other. Since then I never stop taking backup of whatever data I care for, websites, photos, articles, everything.

  • Scott Miller

    Losing a single server made a site go down? That takes a pretty epic level of incompetence.

    • I don’t agree with you. Not every site needs this like load balancers and/or failover servers. There are lots of small sites that run just fine on a single machine.

      • Scott Miller

        At least reduce your downtime with monitoring and automatic replacement. You didn’t even know you were down until they told you.

        The #1 rule of web hosting is plan for failure.

        • On this point I agree with you 🙂 And we do have monitoring of the server via New Relic. A NR-notification was sent 30 minutes before DO mailed us. The server went down in the middle of night, so we read both notifications in the morning when we woke up.

          • Scott Miller

            If you are monitoring with New Relic and have several sites, you can save a TON of money by using fewer, large instances and running the sites in Docker Containers. New Relic charges in the realm of $150 per host, so doing multi-tenant saves a lot of cost.

          • Ambroos

            You’d be surprised how useful just the free tier of New Relic is.

          • Ruben

            Cant agree enough , its really useful and gives pretty accurate info about all my servers with the free tier alone .

            And they also have a decent mobile app.

  • Pingback: 2 – Today DigitalOcean lost our entire server()

  • Jarland Donnell

    Great post! I’m terribly sorry about this event, but I really enjoyed reading your perspective on this. I’m glad that you use our backup system as well as another. That is always exactly what I recommend. You can never have too many backups, but you can certainly have too few.

    Of course, I wanted to take a moment to say that we will continue to work hard to prevent these types of issues. Certainly a single server is always a potential point of failure and some things cannot be predicted beforehand, but we can always do our best, as well as continue to educate others on the value of backups.

    Thanks for sharing this story and telling others about the importance of backups, as well as giving us the opportunity to view this event through a customer’s eyes. There is much value in both of those things 🙂

    • Mark

      I only have 8 years experience in the hardware repair market so only replaced RAID cards a handful of times, less than 10 in fact but on all occasions no data was lost.

      You boost that you run RAID-10 configurations can you explain how you managed to loose data from a 2 array configuration with a card failure?

      • Jarland Donnell

        Wow so it’s been 6 months and I just now saw this. I’m sorry about that. I’m not one of the engineers at DigitalOcean but I work with our support team and I consider myself to be someone who eats, sleeps, and breathes the hosting industry in itself. With that in mind, I’ve seen three cases where you can lose data in a hardware RAID10 array:

        1. The most obvious, of course, letting a drive fail and being negligent, hoping another certain drive on the other stripe doesn’t fail until you “get around to” replacing the other.

        2. Controller goes out in flames of glory by writing trash to the RAID, effectively demolishing the data.

        3. Import goes badly on replacement controller.

        While I have no recollection after so much time as to which it was, I’d surely remember throwing a fit if it were the first one. Our support team is a bit more involved in the company than a support team traditionally is. Our voices are heard and carry weight. We would never stand for such negligence.

        In the time that I’ve been in the industry I’ve seen 2 and 3 happen probably somewhere in the neighborhood of ~10 times. Of course, as a customer you can never really be too sure who is lying about it really being the first option there. My problem is that I don’t lie, to a fault. I would trade my job for telling the truth. At some point it has to be a personality disorder 😉

  • pizzapanther

    Honestly this is a non-story. When using VM’s as a service, it is a given the machine can get deleted at any point and you should act accordingly. Amazon does this all the time with its EC2 instances. While Heroku is a different level of service, they make it a point to delete your instances every once in a while so you don’t get used to storing persistent data on them.

    Now a lot of people use Digital Ocean because they stay up much longer than Amazon and provide better persistence and reliability. But you shouldn’t bet on it.

    • I disagree that it’s a non-story. I think it’s a great story that really illustrates the importance of preparing for failure and keeping backups. How many people do you think read this, went “oh crap” and set up some sort of backup solution or redundancy?

      • Eoin Prout

        Reminds of the story about someone running a bitcoin exchange on EC2, and losing everyone’s bitcoins when he terminated the instance. He complained

      • pizzapanther

        Good point. The story is fairly dramatic and kind of blames DO. That is the non-story. “Oh crap I messed up and this is a PSA for everyone” would have been a better tone.

        • I think the title is def clickbaity and may lead people to jump to the wrong conclusions, but the story has a good lesson to it.

          • My goal was to raise awareness for the problem without necessarily blaming DO. This kind of trouble can happen at any cloud provider.

            And I’m guilty as charged: the link is indeed a bit blickbaity. 🙂

      • I didn’t like the story initially, but from the comments I can see this is more of a “look at us”, advertisement showing why and how they provide additional value, than a “DO sucks” laziness post.

        I also think pizzapanther has a point. There are a lot of untrained people playing at IT, and a lot more people who have a system in theory to work around. I think this should be, but sadly is not a core expectation of service, but as it’s not, it is a key differentiator for those of us with backup and redundancy strategies.

        I would have loved it a little more if the strategy was that all nodes were balanced, so one node failing did not affect everything, or a cloudflare was used, and even if regular non-DO backups were a thing.

  • Doug Smith

    This happens normally with EC2 instances and you don’t get $15.00 credit, just a sorry note.

  • carlivar

    This seems like “duh”.

  • Joke Forment

    I hope this will never happen to us

  • Mark Hahn

    How does their explanation make sense? Losing a raid card shouldn’t be a big problem: just replace it (with the old disks) and you should be up right were you were. Or are they implying that the controller *and* the disks were destroyed?

    • Sometimes a raid controller failure can also cause disk corruption. Unfortunately, a lot can go wrong that makes this not as simple as it sounds. 🙁

      • Mark Hahn

        Or maybe little blue men on Venus caused the server to implode. Seriously, filesystems are designed to recover from pretty serious controller malfunctions. The worrisome thing here is that DO seems to be doing something wrong, since this sort of thing doesn’t happen to other people (and “other” outnumbers DO by many orders of magnitude.)

        If a “malfunctioning” controller really managed to destroy all data on all disks, such that no fsck-like recovery was possible, well, it would be GREAT for DO to tell us what make&model it was…

  • MatthewHager

    We were loyal customers of DigitalOcean for over 2 years. We showed up to work one morning and had a client email stating that their website was down. We checked, sure enough. We tried logging in to our DI account and it said it was suspended. After searching around, we realized our CC had expired. No biggie, we’ll just update it, turn the server back on and be on our way.

    Nope. We had to contact customer service to get back in to our account. After updating the card info we realize that all our droplets are gone. We reach out to customer service again in which case they let us know that when a CC expires, an automated process kicks off and deletes the droplets. We’ve been a customer for 2 years, surely they could pick up the phone and call us. We’ve spent thousands of dollars with them.

    Plan B, let’s restore them from the backups we’ve been paying for. Nope! When they delete your droplets, they also delete your backups.

    This is where we ask to speak to them on the phone and are denied. I then ask if they are insane and why they would delete someone’s servers and backups via an automated process without a human at least checking to see if it is a loyal multi year customer who has a simple lapse with their CC expiry.

    We had to find a backup on a developers machine from over a month prior and rebuild data using various megtods that took close to a week. In the end, DI gave us a $500 credit.

    You get what you pay for. Anyone who uses DI for production or anything more than a hobby app is playing with fire. They do not care, they are apathetic, they will screw you over and the throw you a credit for their terrible service as a half assed apology.

    • Wow, that’s quite a story. I surely hope that have changed their policy since then. The fact that they deleted the backups as well is not so good to say the least.

    • Zach Bouzan-Kaloustian

      Hey Matthew,

      I’m Zach, Director of Support here at DigitalOcean. Thanks for raising this topic.

      I was able to locate your account, and I see that I was the one who granted the credit and followed up via email. I hadn’t heard back from you until now, and I’m happy that we have an open line of communication. I’m hopeful that this thread starts a conversation, that it clarifies what steps we take, and we might even uncover a different solution that works well for everyone.

      To start with, I want to be really clear, this is the absolute worst case scenario. I absolutely don’t want it to happen to any customer, let alone someone who has been with us for such a long time. There’s really a delicate balance that we must strike between customers who forget to pay and customers who do not want to pay.

      Currently, our notifications for overdue balances are sent via email. If a customer account is unpaid on the first of the month, we send ~15 emails total notifying the customer of the situation, and subsequently power off servers 21 days after the account is on hold as a further way to gain your attention. At this time, droplets are removed from your account 14 days following power off, which is an increased amount of time from what you experienced. It was increased from 3 days based on past user feedback.

      Why do we do it this way? In the past, we had no hold or suspension process for non-payment, which enabled bad actors to run for months and months without paying. From a business perspective, we made a decision to put a scalable process in place that limited how long an account could go unpaid.

      As a support team and business, we are always willing to work with our customers who are unable to pay. We are always available via ticket, our contact form, and have made wide-ranging attempts to help users who aren’t able to pay due to banking regulations: https://insidedigitalocean.com/showing-love-to-greece-b15aa9e98275#.xyj2pyc5m

      I’d like some input, so I’ll publicly ask for feedback on questions that I’ve asked privately before. I’d also like any other feedback that we can consider on how to make this a more positive experience for everyone

      -Knowing that we do not do phone support, what’s the best way to notify you of a past-due balance?
      -Is SMS effective at times like this?

      I would love to hear your thoughts.

      Thank you,
      [email protected]

      • Ariel Barreiro

        I got here from HN and at least it’s good to hear that took time to post here. I am a happy customer so far but this raises concern. So to understand this properly, this means that once a payment failed you have 21+14 days to sort that out and then the droplets are lost.

        OK.. my 2 cents, put a process in place to warn customers BEFORE the expiration date, enough time in advance. Say 2-6 months, even several times.

        • Zach Bouzan-Kaloustian

          Hey Ariel — Thanks for the comment.

          I want to make sure I understand your request fully. Are you asking for 2-6 months notice for unpaid balances before anything on your account is disabled/impacted? If so, what’s the basis for that amount of time and if we could only chose one amount of time, what would you request?

          Does it change knowing that if you contact support we’ll provide an extension of time for you?

          • Ariel Barreiro

            No, I didn’t mean for unpaid balance, I mean a process to let customers know that a payment method is about to expire. If the card expires on March 2016, you can start sending reminders to update the card on January or before. You normally get a new card before the expiration.

          • Zach Bouzan-Kaloustian

            Great, that clarification helps a ton! It seems reasonable to notify customers. I’ll make sure to share this idea with our product team to see what we can do with it. As with anything, no promises, but this certainly makes sense to me.

          • I’m a DO customer and I’d rather you did not do more to put yourselves at financial risk for what is frankly laziness on the part of the customer.

            If they miss a bill 14 days without communication should be enough to terminate. Sure if they communicate you have to have a process for one-off exceptions, I was once a day late, I sent an email apologizing; made an ad-hoc payment, and contacted support, asking how to avoid in future.

            You should not prioritize people that do not prioritize your business!

          • MatthewHager

            Hey Lewis,

            I was a happy paying customer for years and would have continued to be a happy paying customer for many years to come. It seems you didn’t read my story and what happened. My CC expired (it had nothing to do with me not wanting to pay). 3 days after, they deleted everything, INCLUDING THE BACKUPS.

            Lets see, 2 years a happy paying customer. 3 DAYS after a simple CC expiry, EVERYTHING DELETED!

            There is no excuse for this behavior on their part. When you pay someone thousands of dollars and pay on time for years, when a simple CC expiry occurs, the should do the curtesy of a phone call. I think by that point I’ve paid them enough money to get a phone call.

            Second, under no circumstances whatsoever should they delete backups or they should at least hold on to them for some period of time in case there was a mistake.

            I hope you TL;DR my post and aren’t really so crazy that you think their processes/behavior in my case was truly excusable to the point that you are giving them a kudos over it.

          • Matthew, you are saying 3 days after CC expired, but from their perspective they have said they wait 14 days before deleting anything.

            For me it would not matter; I generally suspend any non-payer immediately if they have not phoned, or e-mailed me in advance of missed payment, then I assume they have either not planned properly, or do not require service, or don’t value it.

            Whilst I don’t delete their stuff immediately if I got an e-mail from one of them that did not start with sorry, and end with payment, or time-line for payment I’d sure as shit delete everything, at the very least I’d remove any and all prompt-payment discounts as per my terms and conditions.

            It’s not just about being a dick. If I am late to the bank, I get bank charges, the fact I plan enough not to allow other activity to get me bank charges is kudos to me, not to the customers. I give everyone a discount for prompt-payment, so if they pay on the day due or after their bill goes up. Past the point of ignoring the carrot, I fetch the stick immediately.

            I Don’t feel sorry for many businesses that have unpaid balances, they have made a concious choice to under value their own services, and I don’t feel sorry for customers that have not paid, as everyone that has is being asked to support them, so I fix the problem. At the least they won’t have service until they pay, at the most their service, the files they are paying for get deleted.

            I think as a business you have to assume that if someone wants to be a customer, then they want to pay, if they don’t, or want to become a difficult customer, I would sooner make easier money elsewhere working with people that value the services I supply.

            On the potential damage of DigitalOcean Deleting everything, take regular backups, use scripts to deploy systems!

            For one thing I script all server work, deploy, backup, restore, all scripted, so if I did not pay for a long period of time, yes I would be fine with DigitalOcean suspending or deleting my files. If they accidentally delete a droplet, I have at most lost one day of work that I should be able to recover from articles of service, such as invoices, e-mails, letters.

            Lastly, two things. Because my customers maintain their own DigitalOcean account, it doesn’t matter to my business if this happens (in fact if they are not on retainer for server maintenance, or go over I get paid); then it’s between DigitalOcean and the client, which is a barrier I’d encourage others to put up. Secondly they have a low-price point, and high value to customer as the service is easy to use, performant, simple; and they are friendly, we cannot expect them to provide what a supplier that charges more charges, it’s like paying for a kids bowl of cereal and complaining it’s not filling enough. I have had customers on Linode, AppFog, Amazon, MediaTemple, Dreamhost, they are all just vendors, they all make mistakes, and the really skilled amongst us will know this, and setup a process for when things do go wrong.

          • MatthewHager

            You should get a job at DigitalOcean. Y’all have similar philosophies on customer satisfaction.

          • At the point you can’t, haven’t, or won’t pay, you’re not a customer, and you are not in a position of being wanted.

            Ways to not be late with a bill:

            * Set up account alerts, and undertake short or long-term financial planning
            * Set reminders for card expiration dates
            * Set up an account buffer (you can pay before you have to)
            * Buy advanced terms of service (if allowed)
            * Set up account credit terms (if allowed)
            * Make automated payments via standing order, automated wire, or Direct Debit.

            Valuable customers, are like valuable suppliers, they pay early; they say thank you; you have conversations you are not charging for; you get invited to their weddings; they ask you about and involve you in other businesses they start, and pay you, because they value you; they ask about services that are “not on the menu” because they trust you; they take you and your wife / partner out for a meal; you see a theatre show together; they tell you when they have had a baby, and you ask how everything is at home, because you care!

            All these things are true of my business, which is small, which is a decision. The absolute minimum for being a good customer is paying your bill, or being courteous enough to make a call, or send an e-mail saying why you may not have the money at the time it’s needed, provide a time-line for payment, and take steps to ensure it’s not a regular occurrence.

            I won’t worry about losing a customer that doesn’t fit this mould, because you can find them anywhere; there are 7 billion people on the planet, and I respect myself enough to not deal with the ones that behave rudely or disrespectfully, people that undervalue me, or make me or my staff or partners of my business feel bad; because I bring value, and I like to think as a business owner you’d aspire to bring value too.

            Other than reading your self-entitled post, I’ve had an awesome weekend, I hope you have too, and that you do re-evaluate your take on life. Suppliers and other stakeholders will thank you, you’ll get the best deals, it works, just try it!

          • MatthewHager

            I know that painting me to be a bad customer helps your post, but I paid on time for over two years and had a simple CC update that caused a payment lapse ending in all my servers, data and backups to be automatically deleted without any human on their end making this decision. I’m sorry that feeling like what happened was wrong makes me seem entitled and posting about it has made your weekend worse.

            When I don’t get paid, I call and ask if they got my invoice and follow that up with when I should expect payment. In over 8 years in business, only a couple times have I not gotten paid and had to turn off services. In both cases, I still have the customer’s data.

          • Bjoern

            Hey Matthew,

            I just wanted to leave you a reply to your statement:

            “You get what you pay for. Anyone who uses DI for production or anything more than a hobby app is playing with fire. They do not care, they are apathetic, they will screw you over and the throw you a credit for their terrible service as a half assed apology.”

            I know you are frustrated, but I think you should take a deep breath and also think about what ultimately lead to what happened to you.

            Your inability to inform yourself properly about how DO handles CC Expiration and having a valid payment method available.

            “When I don’t get paid, I call and ask if they got my invoice and followed that up with when I should expect payment.”

            If you have 50 or 100 clients that may be ok. But not if you have, I don’t know how many clients DO currently has. But for a company of that size you need to automate the process.

            What happened to you is something you can’t blame on anyone other than yourself. If I have clients that I have a contractual obligation to because they pay me for hosting their projects / apps / websites or whatever it is that you do, then I make sure I talked to DO about my concerns and covered every worst case scenario I can.

            What happens if my Server gets destroyed in your Datacenter, not only by hardware failure but physically?

            Are Backups kept on Servers in the same Datacenter as my VPS or where do you store them?

            What happens when my payment method expires and I did not update it beforehand?

            Before I started moving to DO, that’s just a few questions of the catalogue I had asked them before signing up two years ago.

            DO is still a young company and shit happens. Yes, it’s unfortunate what happened to you and I am pretty sure that your case may have been one of the reasons they are regularly reviewing internally how to handle situations like yours. But I think your final judgement is unjustified.

            To show you in contrast what happens when you actively talk to DO if you know that a Problem occurs.

            Two of my close relatives died on their overseas holiday in China after an accident. After talking to the embassy and the Chinese police I had a close estimate of repatriation and all other costs to get them back home.

            There was no way I could pay all outstanding bills and get them back home at the same time. So I wrote DO and the other companies I have financial obligations with if we can find any solution because I will not be able to to pay them in time for at least a month or two and explained them why.

            Within five minutes I had an answer from DO and they had me covered and told me not to worry and take care of my family first.

            This is not what a company does that doesn’t care about their customers. They just could have said your problem, pay the bill or face the consequences.

            So instead of calling them out for bad service. You should accept that you created your problem in the first place and learned from your mistake.

          • MatthewHager

            The handled your situation well. They didn’t handle mine well. Just because they handled your situation well, doesn’t justify what happened in my case.

            I could go and argue that their size (being much larger) gives them more opportunity to handle situations like this. AWS has reached out multiple times to handle various situations and you can get them on the phone. Also, them being young isn’t an excuse. I’ve noticed that many young startups are banking on having great customer experience and are super accessible. They are choosing a “never talk to the customer” type approach and they sure didn’t advertise that during sign up. They should add to their FAQ on the signup page that they delete everything, including your backups, if you don’t pay on time. This won’t increase signups, so I’m sure they won’t.

            I’m happy they handled your case well. It’s good that you posted your positive experience to balance out my negative experience. I don’t think one of us is right and the other wrong. I think they deserve kudos for what they did for you. I think they should immediately rethink their account suspension policies.

            P.S. This happened to me months ago. They are just now asking for advice on how to make this all better because this post made it on to hacker news. They’ve still not fixed the issue.

          • danneu

            That’s some hand-wavy cynical nonsense right there. It’s like defending why you never give anyone the benefit of the doubt because you might be right once.

            Some companies help you out when shit happens, some don’t. These posts illuminate that DigitalOcean doesn’t.

            You and your business might not help either in these rare events, but there’s no need to justify yourself. Just make it obvious that this is how you operate so people can avoid you.

          • Coreinsanity

            Honestly I think all of you need to have a bit more understanding. Sure, this guy could have done things to prevent this on his own. No doubt.

            That being said, cards take years to expire. Things fall through the cracks and get forgotten (ie: Update the card). Maybe things are hectic, maybe he just forgot, maybe he had a rough time? Perhaps someone stole his card and he’s having to go through the PITA process of taking care of identity theft and this just gets forgotten temporarly? Is that DO’s fault? No. But at the same time, understanding is nice.

            You talk about wanting relationships on a personal level with your customers, but at the same time treat this person, who has a completely understandable situation, like he’s trying to intentionally go out of his way to screw over DO because he’s upset about his CC expiring and his stuff getting deleted?

            I’d be livid if I was him. At least set up text notifications for things like this. Frankly if the alternative is deletion of all my stuff, I’d like to get spammed on my phone and e-mail.

            The problem here is bigger than him simply not updating his stuff. It’s the fact that because of how they did things, that means a set of circumstances (A “perfect storm” if you will) could completely screw over one of their customers whose doing nothing wrong and has bad timing with their service and other things going on. That’s my issue with it.

            Again, not saying DO is entirely to blame, but automatically deleting anything in this magnitude should be done with extreme caution and as a completely last resort with lots of effective notifications. Your servers should go down, you login to see why and it should FORCE you to see the notifications about your card and late payments. If after ~30 days, you don’t login, or deal with the situation after you do, THEN delete the VMs. 30 days after that, delete the backups. Especially if it’s VMs with actual traffic, and a decently long standing, on time paying customer.

          • To be honest, if you want to operate in that way, you go do it. I Have no objection to your businesses operating in that way, but you will not be getting my money; or me cheer-leading the movement, as I do not feel I should pay more to support your business, or misplaced philanthropic endeavours.

          • Coreinsanity

            Oh please. I wouldn’t have you cheer-leading for anything. You claim in one hand to want customers who value you, and that you want a personal relationship. Yet frankly that comes off as one-sided when you would just about immediately throw out their stuff if they don’t pay once within a day, even if they are a long standing customer.

            The thing you don’t seem to get is that it’s as much a business strategy as anything else. While this “bad press” may not be destroying DO, it’s certainly not good when you read headlines about your entire VM getting destroyed because of hardware failure, or your card expired and they deleted all of your stuff.

            You don’t do it strictly to benefit the people, you do it to make sure you don’t fuck up and delete something by mistake. That’s why I say you don’t get some automated system deleting droplets THREE DAYS after the power off, JUST IN CASE something messed up and they couldn’t deal with it. Automated systems have flaws.

            But you strictly deal in absolutes. To you Matthew is a whiny self-entitled bum who deserves no sympathy, should be better organized, and even though he was a loyal customer for years spending a decent amount of money gets no understanding at all over one slip up. I’m apparently some philanthropist (which if you knew me, is a load of crap…)

            It’s not about philanthropy, it’s about building a relationship, and building a reputation for putting the customer in a place of importance. If the cost of that is prohibitive, so be it, don’t do it. But this seems less of an issue of cost-prohibitive concern and more a case of they just didn’t really think the system through the first time around. They did change it after all, or are going to.

      • MatthewHager

        First, you shouldn’t suspend an account, delete the droplets and delete the backups in one automated process. That is insane! I don’t even know what your team was thinking when they wrote those lines of code. The fact that you’ve been in business as long as you have with the possibility of an automated action 3 days after a CC expiring completely destroying everything (including the backups) to where it isn’t recoverable is disturbing to say the least.

        Second, once you completely screwed us over, you could have at least done us the courtesy of a phone call. You claimed that this was a pretty rare incident. You were responding to our tickets as we put them in and it was during normal hours. If you somehow concocted the Silicon Valley startup dream of an office without any phones, you could use your cell phone and call us like we requested many times. I know on your end you have a million more customers, but to us (a happy paying customer for many years) you just destroyed without the possibility of recovery many years worth of code and data. This shows just how apathetic and allergic to your customers you are. Only now that someone is posting this stuff online do you act like you are going to change process and ask questions about how you can do better.

        “Is SMS effective at times like this” – anything is better than what you put us through.

        “Knowing we do not do phone support” – Our developers could have programmed you a system to call your customers with a recorded message using Twilio in less time than it took us to recover from this event.

        How can you make it better? Give a damn. Give a damn even when it isn’t public.

        • Lasse Rafn

          4 months late to comment..

          I can confirm that Digitalocean sends TONS of e-mails. We had this thing happen because we didn’t check the e-mails (nor servers) for roughly 2 months, and the bank account was empty. I think that it’s enough time for the average customer to notice. The inbox was BOMBED with mails saying we will be shut down and deleted if we do not pay.

      • In my opinion, suspending the account is justified in case of a failed payment. The droplets may be turned off, but the data should be left for a while longer. And the backups should be left alone even if the droplets are deleted for at least 6 months.
        You could probably allow access to the backup after outstanding bills are paid off.

        11 months later, I wonder if this is being handled in a better way now.

    • danneu

      Happened to me on another webhost, except it was a dedicated server. They parted out my server, backup server, and all the hard drives after a charge on my card failed. I was a customer for five years. It ruined a seven year old forum and I don’t think it’ll ever recover.

      I learned my lesson about having off-site backups, but it doesn’t excuse this behavior. Especially if you’re going to provide a backup system that I’m paying extra for, why wouldn’t you file away my harddrive until you hear from me? At least for a month.

      • MatthewHager

        I totally agree. I wish my story could have gone like the OP’s ending in recovering from backups. I get it, CC lapsed, shut my stuff off. Just give me some way to recover. The backup deletion truly is where they went from apathetic but understandable to insane.

    • This is crazy, I doubt dead droplets cost that much, and either way, backups are on slow inexpensive storage… Although we have daily backups, it would be a PR nightmare explaining all the downtime for a missed CC date… I don’t see the point of backups if they get deleted togheter with the droplet, what if a hacker gets into my account … Glad we did not go with DO 100% but split options around.

  • Shawn

    Good article, and I’m glad you followed it up with your positive results of your droplet restore. Reading the title and getting into it, I didn’t want this to be a nasty article about poor performance and Digital Ocean bashing. For what you pay, it’s an incredible offering. If you want to minimize this kind of thing in the future, you can very cheaply add a 2nd droplet in another one of their datacenters, front it with either DNS round robin, or better yet, a $5 droplet load balancer. Now you’re protected against downtime as well as a destroyed droplet.

    • Those are good tips.

      And indeed, it wasn’t my intention to bash on DO. Unless they are going to lose droplets frequently we will keep on relying on them.

      It doesn’t really matter which provider you’re on, hardware failures can always happen.

  • GusDeCooL

    Awww sorry to hear about your server. But look like your server is still small considering the price is only $5/month. No need for balancing or any complex infrastructure that will cost you more money.

    But yeah, i do agree always always always have external backup service.

  • Joel Bondurant

    There’s a magical backup solution for servers called git. If you can’t redeploy a new server in a few minutes, that’s not on DigitalOcean.

    • In most cases you don’t want to store things such databases, uploaded media or environment configuration in a git repo. So solely relying on git isn’t enough imho.

  • TimVD

    I feel sorry for what happened but I’m sure you learned a lot from it.

    One should not blame Digital Ocean too much about this. Given the prices they charge they give a pretty good bang for your buck and things like this don’t seem to happen often.
    A VPS will always be a DIY solution, people who get scared about these things or don’t want to take precautions should look at managed solutions instead.

    • I totally agree with you. We were prepared for this kind of trouble. My goal in writing this article was not to point a finger at DO, but to give a cautionary tale why taking your own backups is mandatory, even when using cloud hosting.

      You know they say that the cloud is just someone else’s computer. Well, that computer can crash too.

  • Walter

    Great. Absolutely golden for anyone or company hosting on DO.

  • Christophe Courtaut

    You should not rely on your cloud provider to provide you “pets” server, this is the old wrong way of doing things. Cloud providers provides you “cattle” servers, that might go down, or even lost, and you have to deal with it.
    You have to make your services highly available and/or consistent if that is a requirement.
    And you should have a plan for DR.
    If you are hosting a website in a single droplet, you are the one who designed a system with a single point of failure.


  • I’m sorry to listen this… but at the same time, you should be doing backups.

    I mean I was on very important hosting company and the data center literally set on fire.

    I wasn’t expecting them to have backups, hopefully they did, but I had already a backup up and running in less than an hour. Then they restored their service and my backup was useless. That’s good practices. You should embrace them

    Totally your fault. And Digital Ocean has backups, that are hosted on Amazon..

  • Pingback: Google()

  • A great line from last year’s Chefconf that applies here: “Remember, VMs are cattle, NOT pets!”. If your infrastructure isn’t repeatable, with respect, you’re not doing it right.

  • Stephen

    Do you know what brand of disks they was using?

    We found some disks have such a high failure rate, we now use HGST disks and with over 250 in use have not had a single failure in two years.

    • This does not mean anything; plan for the worst case, and put processes or mitigations in-place

  • Food for thought, Thanks for Sharing. Really gonna rethink some disaster recovery strategy

  • Sal F

    My question to DO:

    Why weren’t you using a shared SAN backend? This is a storage problem that can happen on literally every one of your hosts.

    I understand we all need backups, but is the “cheap” rate due to cheap infrastructure?

    • To be fair, SANs can and do fail even more spectacularly. They’re far from a magic bullet that can prevent issues like this.

  • Very nice, love the contingency plans!

    All of my sites codebase are handled via git deployment, I run daily database backups with tools like Navicat, and I’ve stopped allowing clients to upload data to the same filesystem the site sits on and instead shunt them to a S3 bucket or Dropbox like service

    In the event anything goes to hell, its trivial to spin up another instance, git pull, and execute a db backup file 🙂

  • Gio

    Whaah? I thought one of the purposes of cloud was that you are insured against hardware failures, guess not

  • Pingback: İşinizi Korumak İçin Verilerinizi Koruyun – NGX Storage()

  • Pingback: February 2016 Review | Andre Madarang()

  • Pingback: A modern backup solution for Laravel apps - murze.be()

  • Pingback: Let’s Talk About The Backup Strategy - murze.be()

  • Pingback: Backups moderniseren met borg backup - Tim_online Blog()

  • Pingback: http://gp.se()

  • Lasse Rafn

    I’m already backing up the entire DB every day, using Spatie backup (Y) All backups are saved at AWS and I download the latest, to my local computer, every week, in case shit goes down.

  • Pingback: Taking care of backups with Laravel - murze.be()

  • Thanks for the article. A customer has complained about communication loss, and it seems to be around the time of the backup, as seen in Graphs. Have you noticed anything similar to that? We use an app for the accesses, and it times out if it takes longer than 10 seconds to connect, which of course is normally fine.

    Thanks in advance.

  • Pingback: 83 Top Laravel Articles in 2016: What You Clicked in Newsletters - Laravel Daily()

  • To avoid this, and to enable me to sleep at night, I went for a middleman approach, with Cloudways handling the everyday admin, still hosting at DigitalOcean. If a server instance would completely die on me, I only have to wire up a new and identical app instance (takes a few minutes) and dump my last backup there. I don’t have to manually work with server configuration. I know people that consider this kind of “hosting without hosting” as snake oil, but I’m no Linux expert, so this was the only realistic solution for me. Cloudways also takes care of e-mail hosting via Rackspace, and server monitoring is excellent. What I lack is redundancy, but that’s not an issue at this stage.

  • Excellent post, – we have been using DO for about a year and a half at iDZYNS and have loved the service., all of our WP sites run the DO backups plus we do full site/db backups to Amazon S3 . Been looking for a solution for our apps and sites running laravel where we cannot use a plugin.
    We love the s3 service for its affordability.

  • Pingback: An easy to use server monitor written in PHP - murze.be()

  • Pingback: Laravel Forge + Envoyer + Managed Hosting = Nucleus()

  • Pingback: Laravel Forge + Envoyer + Managed Hosting = Nucleus -()

  • Pingback: Laravel Forge + Envoyer + Managed Hosting = Nucleus – Tudo sobre PHP()