At one level we should all be very worried about anything we post online, because (as has been proven numerous times) when you post anything it can stay around forever. What you may flippantly say today can come back and bite you in the ass at any point, for decade’s to come. Just look at the current US presidential election. There are currently thousands of journalists trawling though every single word said, written or recorded by every prospective candidate, and anyone they ever knew, looking for anything to keep people looking across the next ad break.
Conversely it’s almost impossible to imagine that all the data that is being generated today will be kept forever. Through a mix of data corruption, replacement of legacy technology and efforts people make to delete older or boring material, it’s hard to know what data will be kept.
Today’s storage mediums such as disk drives, tape backups, solid state / flash memory, CD’s and DVD’s all have very finite lives.
• Tapes rely on the magnetic properties of metalized particles sprayed onto the surface of a long tape. Over time the magnetic alignment of the materials do decay, and because tape is wound as a spool, there is also the impact of different layers of the tape interacting. Actually since the most modern tapes use techniques to squash ever greater volumes of data per square inch, even with the latest data correction and recovery algorithms and the advanced physical material newer tape technologies may actually be getting less reliable in terms of longevity.
• Hard disks have the data stored on a magnetic layer sprayed onto disks, which spin at high speed with the read/write head sitting microns above the surface. There is a lot of continual physical movement and so they do wear out. They may last for decades, but ask anyone who runs a datacenter, how often they have to replace blown drives, it is a daily activity. Modern disk drives are exponentially more reliable than ones of decades ago, but there are just so many in use that the math means even a small percentage of a large number is a large number.
• CD’s and DVD’s in theory could last for hundreds of years, but writable ones are much more delicate than you would imagine. They rely on organic compounds as the writing layer. A small scratch on the label side can allow moisture to invade causing the written data to erode. Yes DVD’s “could” last a long time, but it’s not guaranteed. And in the same way that it always rains after you wash the car, that most critical disk is always the one that has a problem being read back.
• Solid state drives and flash cards don’t rely on magnetic particles or moving parts, but the longevity of them is not that clear, because they really haven’t been around for that long. Depending on the technology used some people say the storage time for offline drives could be as low as a few months. USB sticks seem to have longer planned lives, maybe in excess of 20 years or more. But we are in the “wait and see” phase.
There really is no medium in use today that can guarantee stored data will last hundreds or thousands of years. The most reliable medium we know of is still stone tablets and parchment, but it’s fair to say these have a very slow write and read speeds and the data capacity per square inch really doesn’t work for today’s data volumes ☺
Technology in use just thirty years ago to provide long terms storage probably worked quite well, but there are now so few drives available to read the stored data from that time, it’s not clear how good it really was.
Cloud storage sounds great, as you are making the problem someone else’s. The idea is you pay them, and they take on the burden of making sure your data is retrievable in the future. This is probably a good idea, as a company that spends all its time thinking about just storage should be able to keep moving data to modern new types of devices as they become available, therefore keeping it readable . The caveats are :
1. You need to keep paying them. Forget and your data could be deleted.
2. They had better stay in business.
As the years go on, the amount of new data being created explodes in terms of volume, and the information about the data (the meta-data) created with new data continues to improve. This means that older data seems to be harder to keep track of than modern data, and data can get old pretty quickly. How much effort do you put into keeping data you created five years ago? You probably still have it, but could you find it if you need it?
Data also becomes corrupted quite easily. Things happen. I have a lot of old emails that were created in formats that I no longer have ways of reading. I have old databases created in programs that no one has used for decades. This data is in effect now dead. I can try and convert it into modern formats, but in the process I’m hoping that every aspect of how the old data was created is maintained. But in my experience converting old formats into new formats never works quite as well as expected.
We all have more and more data to manage, and the older bits get less and less attention, until they are effectively unusable.
It doesn’t matter how you store it, the world moves on and old data looses its value quicker every year. Old data becomes harder to manage, and while it may still exist somewhere, it’s almost impossible to find and use.
There are two situations you can rely on:
1. If you said something once that is now embarrassing, that piece of information is most likely to survive and be readable.
2. If you want to access data that you created more than 5 years ago, it’s never easy.