Published: Jul 2011

Archiving your digital images

When it comes to data integrity and preservation, there are several risk factors to take into account:

Theft, possibly inadvertent

This is a substantial risk for physical media stored at your house: someone steals your laptop and a few external drives while they're at it. Alternatively, they take your camera along with a bunch of memory cards you hadn't yet downloaded that were in the same bag. Keep your media physically separate from anything that might be worth stealing. In the '90s a well-known football player in Canberra had his house burgled and lost his video camera - along with all the tapes of his kids growing up that just happened to be in the same bag.

Fire, flood, natural disaster

Only off-site copies will save you here. DVDs and other optical storage might survive a flood.

Media failure

Everything breaks down sooner or later. Top of the list is anything with moving parts (i.e. traditional spinning-rust disk drives). RAID arrays are not a panacea. In fact there are circumstances where they can make things worse.

Let's say you have 4 disks in your RAID-5 array and one of them fails. Everything keeps working. Maybe you get an orange light on the enclosure. Maybe you get some kind of alert from the operating system as well. It keeps working and you figure you'll replace that failed disk sometime soon. However, those remaining three disks are all working a little bit harder now as it has to reconstruct the missing data every time a block from the dead disk is needed. All the disks are the same age, and probably from the same batch off the production line. Pretty soon another one fails and then you've lost all your data, not just the data on the failed disks. Couple that with the fact that RAID doesn't protect you against power spikes, mechanical shock to the enclosure etc. and it starts to lose a bit of its gloss.

Media degradation

Entropy again - everything falls apart sooner or later. Optical media generally has an estimated lifespan in the decades, but can be much shorter. You have to be prepared to migrate data on optical media every so often if it's important.

Media obsolescence

Will you be able to read a DVD in 20 years time? Will your external USB disk still be functioning? Will USB still be around? Will the filesystem on it still be recognised? Again, you have to be preapred to migarte.

File corruption

Always check your archival copies against the master copies, no matter what kind of media they're on. We're almost all using cheap commodity hardware, and errors from flaky hardware or software or operator error creep in from time to time. Always make sure you can read back any optical disks you write on a second machine.

File format obsolescence

What are the chances that your camera raw files will make any sense to software 20 years from now? Ideally, store copies of your images in TIF format alongside your raw files, or (as I do) include a copy of dcraw source code that knows your camera's file format.

Stupidity

We all do dumb things from time to time, so make it hard to shoot yourself in the foot. Write-once optical media scores well here - once something's on the disk there's no way to modify it, accidentally or otherwise. To get images off your memory cards, copy them to your hard disk, give them read-only permissions and put the card back in your camera bag. I have my computer set up so that memory cards from my camera will mount read-only by default, so nothing I do on my computer can modify data on the card.

Having outlined the risks, the next step is to figure out how much you care. In reality, most of us are likely to make only a small number of truly remarkable or personally important images over a lifetime, and it's likely that you'll know them when you see them. In this case, maybe the best strategy is to be extra-diligent with storing and migrating your "keepers" and less so with the rest. This is basically the thinking behind my current arrangement.

  • Mount camera memory cards (read-only), copy to disk and set read-only permissions.
  • Generate a checksum file with md5sum from the files on the card.
  • Unmount the card
  • Check the copy using md5sum -c
  • Copy all files to 2 sets of DVDs. Check those with md5sum -c
  • On disk, delete all but the "keepers"
  • Back up to an external disk
  • Take one set of DVDs and the disk to an off-site location. Swap the disk with my alternate off-site backup disk that was living there
  • Backup to the swapped off-site disk