Computer >> Computer tutorials >  >> Hardware >> Hardware

Hard disk reliability study - 2005-2020

As you already know, I like to do long-term tests and reviews of hardware and software that I use. Over the years, I've given you my take on how different operating systems progress and change, how different laptops cope with the passage of time, and now, I want to embark on my most ambitious long-term project yet. A reliability study of hard disks. I've waited fifteen years to publish it.

Because I needed time to gather data that has value to the readers. Unlike Google and Backblaze, I don't have thousands of disks buzzing in a data center, so I couldn't just provide any sort of results quickly. But I think you will find this study valuable, as it took place in my production setup, under real-life conditions most home users could or would encounter.

Hard disk usage conditions

To make this relevant, it is important for you to understand how my setup is wired:

  • The table below does not include all the devices I had used in the past 15 years; for example, I've not listed a few Thinkpads I had (like say T61 or T400) and such, because they were used for purposes that go beyond the main purpose of the study above, so in order not to taint the results, I had them excluded. However, if anything, their inclusion would only make the results even better. Among the different laptops I used, only one had a disk failure (an old T42p, when it was already long in the tooth), and all others finished their useful service without any issues. So this table is conservatively optimistic.
  • All permanently fixed/used disks are attached to computers that are powered by Uninterruptible Power Supply (UPS). This also includes external hard disks that come with their own power supply - typically the large external enclosures (e.g. WD My Book).
  • Laptop hard disks are used with the battery present in the laptop chassis.
  • Most of the disks in my setup show a temperature of 35-45 degrees Celsius.
  • I am partial to Western Digital hard disks, which is why they are the majority of listed devices.
  • I have listed mechanical disks only; no SSD. I don't have enough information on SSD yet.
  • Only hard disks used for more than six months are listed.

Legend

The table below summarizes my findings. Now, here are some explanations before we delve deeper.

  • Disks are sorted by year (third column - From) - when they were first introduced into my setup.
  • To - indicates the current date and state. Now means the disks is still in use. Year means the date of decommission.
  • Type - We have (D)esktop, (L)aptop and (E)xternal hard disk (all powered by USB).
  • Usage - Denotes how the disk was/is used. 24/7 indicates a device that is constantly on. X/M denotes a disk that is used periodically, with X being the number of days the device is used in a month, on average. For example, 1/M is a disk that is used for one day per month, or 12 days a year. This could be 12 days used in a row, or powered on 20-30 times for several hours. 30/M means daily usage but not 24/7. I cannot be ultra-precise when it comes to external hard disks.
  • Result - OK means the disk has been decommissioned in a healthy state or it is currently in a healthy state. OK(b) means OK but. In this case, we're talking about a disk that is working but has errors. F means the disk has failed. DOA means Dead On Arrival (purchased faulty).
  • Notes - if a multiplier is used (Identical xNumber), it means there are that many identical disks in the setup, with the same results (makes the table shorter and easier to read).
  • I did not list the exact models for my hardware, because there are tons of tiny variations between different models (like the year they were produced, the fab, the batch, the exact firmware version, and so on). For laptops, I used whatever information was available to determine the hardware.

Results

And this what we have:

Disk Size (GB) From To Type Usage Result Notes
WD Black 200 2005 2011 D 24/7 OK(b) Had a likely imminent fail SMART error but continued working for 1+ year without issues
WD Black 160 2005 2009 D 24/7 F Failed without prior warning
WD Black 250 2006 2012 D 24/7 OK Identical x2
Hitachi 160 2008 Now E 1/M OK Used in custom enclosure
WD My Passport 250 2008 Now E 1/M OK
WD My Book 500 2008 2018 E 24/7 F Became inaccessible without prior warning
Laptop disk 320 2009 Now L 3/M OK
Toshiba 320 2009 Now L 5/M OK
WD Black 250 2009 2011 D 24/7 OK
WD My Book 500 2009 Now E 24/7 OK
WD My Book 1000 2009 2017 E 24/7 F Would click on spin-up since day 0; exhibited heating and no spin-down two years before failure; became read-only
Laptop disk 500 2010 Now L 5/M OK
WD My Passport 640 2010 Now E 1/M OK
WD Black 500 2011 2020 D 24/7 OK
WD Black 2000 2011 2020 D 24/7 OK Identical x4
WD Essentials 1000 2011 2015 E 30/M F Became read-only without prior warning
WD Blue 1000 2012 Now D 24/7 OK Identical x2
WD Blue 1000 2012 2017 D 24/7 F Became read-only without prior warning
Laptop disk 500 2013 Now L 10/M OK
Laptop disk 1000 2014 Now L 10/M OK
Laptop disk 1000 2015 Now L 20/M OK
WD Essentials 1000 2015 Now E 1/M OK
WD Essentials 1000 2015 Now E 30/M OK
WD Elements 1000 2015 Now E 1/M OK Identical x2
WD Elements 2000 2015 Now E 1/M OK
WD Black 2000 2017 Now D 24/7 OK
WD Elements 2000 2017 Now E 1/M OK
WD Elements 2000 2019 Now E 1/M OK

Reliability calculations

From this table, we can see that I experienced:

  • Desktop disks - 2/14 failures (14%) over a typical usage period of 5 years. It is also interesting to note that no disks failed after their fifth year of usage. In other words, 2/14 failures for disks anywhere between 3-9 years in use.
  • Laptop disks - 0/6 failures (0%) over a typical usage period of 5 years.
  • External disks - 1/14 failures (7%) over a usage period of 5 years, 3/14 failures over a usage period of up to 10 years. In contrast to desktop devices, only 1 failures occurred in the first 5 years of any disk's life, and the remaining 2 occurred in advanced stages of their use.

Hard disk reliability study - 2005-2020

  • Out of 5 failures, only 1/4 disks (20%) exhibited early signs of pre-failure.
  • Out of 5 failures, 3/5 (60%) resulted in read-only devices where data could be read from and partially salvaged.
  • Only 1/30 disks had a SMART error - and did not fail (can be considered false positive).

Based on the data, the estimated MTTF is roughly 61,000 hours, with 5/30 failures over average ~7 years of use (taking into account total usage time and disk age), which means I could expect 1 in 6 disks to fail after being used for about 7 years constantly, with the actual cumulative breakdown: 1 failure after 4 years, 2 failures after 5 years, 3 failures after 6 years, and 5 failures after 10 years.

Hard disk reliability study - 2005-2020

Year 5 is the riskiest with 2% normalized annual failure rate (no data redundancy).

In other words, practically, if I keep two copies of any which data, the likelihood of data loss is 2.5% over a decade, or 0.06% for three disks. So this kind of confirms my backup strategy from a while back, and also shows that it is important for you to keep multiple copies of important files, if you want them to outlast your hardware.

Conclusion

There you go. I hope you find this 15-year-long study valuable. Of course, any techie like me could do it. All techies hoard hardware like mad, and I'm sure most of Dedoimedo readers have a bunch of computers and tons of hard disks strewn about, so it's just the matter of compiling the right data. And I'm sure every such compilation would be compelling. A compelling compiling, hi hi.

If you have any comments or suggestions about my findings, I'd love to hear them. Again, I don't have a massive data center, so I can't do an accurate comparative study between vendors, disks sizes and alike, so do take my results with a pinch of cardamom. But I believe my numbers are quite indicative for home usage scenarios, so if you're mulling how to handle your data down the long trouser leg of time, you have some indication of where to start, and how to hedge your odds. Take care.

Cheers.