Archives: Redux

My recent Archives article was met with some controversy and debate, which is great. I love controversy and debate, and a terrific discussion ensued. That discussion has led me to think a bit harder on my archive plan, and I'd like to follow up on the matter with some of the specifics of said plan, and expand on some of the ideas touched on therein.

It's Personal

In the Archives post I basically said I'd be archiving all my "non-essential data" to hard drives and reserving optical media archives for only the most essential archives. I should first point out that what I am talking about here is my personal data. This is not necessarily a method I'd use at work or for a client. Archive methods should be specific to the needs of the situation.

The Future

One of my rationales for using hard drives was that hard drives are more likely than optical to be accessible in 10 years with the equipment of the day. It's this particular idea that received a great deal of criticism, and I'm starting to see why.

Just a few weeks ago I had occasion to archive some museum kiosks that ran from some very old PowerMacs. Luckily, these PowerMacs were just barely of the era when ATA drives were starting to be used as internal drives on Macs. Getting the data off these systems was fairly straightforward. I simply hooked PowerMacs' the ATA drives up to a firewire case and archived the data to DMG. Shortly thereafter, however, I wanted to perform a similar process with a slightly earlier vintage PowerMac. This machine, however, contained a SCSI drive. And finding a way to access and archive this drive proved almost impossible without going to extreme lengths and making obscure hardware purchases. Had there been some kind of optical archive of these systems, I would have almost certainly been able to pull a backup using today's equipment.

I'm not sure what the future of optical media is. Until recently, I was pretty convinced it was not long for this world and would surely be displaced as a distribution medium by the web. But after thinking on the comments to that article, and talking to people way smarter than me on such matters, I realize I may be wrong. And if that's the case, optical will be more likely to be readable than hard drives ten years in the future. But whatever the case, this is certainly true for media from ten years ago. You're more likely to be able to read ten year old optical media than you are hard drives of that era.

Non-Essential Data

That said, I'd like to clarify the "non-essential data" qualifier I tossed in in the article. To be clear, I'm not completely eschewing optical media for my archives. What the article represented was my shift from optical as my only form of backup to hard drives as a significant if not primary form of data backup and archive.

To get even more specific, in the past I archived everything to optical media. But with the huge amounts of data I now collect, that's not really so practical anymore, nor is it necessary. So these days the bulk of my data — large, non-essential data, things like ripped DVDs, video captures from tape, software installers, and data with a shelf life (i.e. that is only useful for a period of time or that relies on old versions of software or hardware) etc. — will be archived to hard drive. This will allow easy storage and retrieval. And it should last long enough. The idea is that this data isn't forever data. It's stuff I want to keep around for a while, but if I haven't needed it in ten years, I probably won't ever need it again.

More important data — of which there's really not that much, but stuff like big video projects (sans captured media), photos, my websites, contacts, stuff that would really kill me to lose — I'll be burning to optical. That way I have double backups of it (I'll also keep it in the hard drive archive), and I'll have it on a more robust medium that may have a better chance of being readable than hard drives in the future.

So what's really going on here, for me, is a prioritization of my data backups that's reflected in my archive procedures. With this prioritization, I can now rely much more heavily on hard drives as an archive medium. Using hard drives I can back up and access a lot more stuff with much greater ease and speed. Doing this allows me to use optical media only for the most important data. But make no mistake: optical will still be an important component in my backup strategy.

Live Archive

I wanted to also take a minute to mention one way hard drives are somewhat future-proof and useful as a true archive, and this is the idea of a live, rolling archive.

In the lab where I used to work we kept — or tried  to keep — a long-term archive of all student work that was accessible to incoming students so that they could look at and benefit from the work of their predecessors. Our students made all sorts of work, from web projects to video and animation projects to installations. And their work was initially being archived to all manner of media, from tape media to optical. There was no standard. By the time I got involved there were projects going back ten or fifteen years, and it was becoming clear that, no matter what medium we used today, we'd need to re-archive everything every so often as data access techniques and hardware evolved. I believe that, in a case like this, where the archive is constantly growing and reaches back well over ten years, but to which access is always required, the concept of the hard-drive-as-archive-medium is a sound one. The implementation would be fairly simple in concept: everything — the entire archive — is kept on a hard drive to which the community has access. As the archive grows, say every few years, it is transferred to larger storage. As storage standards change, it is transferred to the latest greatest medium of the day. Of course, redundant backups are also kept of the entire archive. But since this data is constantly being re-archived, hard drives — or whatever replaces them in the future — make for a sensible way to have a rolling, live archive, and reduce the need for more permanent solutions like optical. Perhaps Chucky, in the comments to Archives, put it best:

"In other words, hard drive archival demands cycling your backups over time to new hard drives with fresh magnetic media and evolving HD interfaces."

I guess the overarching lesson here, if there is one, is that your archive method should reflect the specifics of your situation; there is no one archive method for everyone. The corollary to that, for me, is that hard drives can (and will) now be a significant part of my archive method.