How to Properly Archive Your Digital Files

The original proposal for the World Wide Web, written by Tim Berners-Lee in 1989, is an important piece of internet history. It also can’t be opened on modern computers.

John Graham-Cumming, a British software engineer and writer, attempted to open the Word document containing the proposal. Modern versions of Microsoft Word and Apple’s Pages both utterly failed to open the file, as he outlined in a blog post. The open-source word processor LibreOffice worked, albeit with messy formatting. Graham-Cumming ultimately found a PDF exported by CERN in 1998, which was the only way he was able to see the document as it existed in 1989.

It’s worrying that such an important piece of history, in such a common file format, could be almost completely lost to the passage of time and software updates. Anyone with a collection of old digital documents, photos, and videos might be wondering if the same thing will happen to their files, which is the sort of question digital archivists deal with all the time, it turns out. So I reached out to one.

“Twenty years, in the digital realm, is ancient,” says Lance Stuchell, director of digital preservation services at the University of Michigan. His team is frequently tasked with recovering digital files from old computers and storage mediums. “We have a lab that can deal with old media—floppy drives, CDs, older computers. We can get that off of those types of media and move it into our preservation system while ensuring we don’t mess it up while we’re doing it.”

But getting the files off the drive is just the first step: Then you have to open them, and leave them in a state that will be openable for decades to come. It’s a job that’s given Stuchell a reason to think about strategies for keeping documents around as long as possible. I asked him what those of us who aren’t professional archivists should do to ensure our files last decades.

Use Open Formats

The Word document I mentioned before could no longer be opened by Microsoft Word because the software has changed over time. This is part of the challenge of archiving digital files.

“With physical stuff, the less you look at it the longer it lasts,” Stuchell says. “Digital stuff, we’re constantly fighting with obsoleteness. As the file moves through time, it’s losing information.”

Updates to software like Microsoft Word mean that files that opened fine in the ’80s don’t open in the 2020s. Part of the problem: Microsoft, and only Microsoft, controls the file format, or even knows how it works. For this reason, Stuchell says he encourages people to export files in an open file format—especially files they want to keep accessible for the long term.

For documents he recommends PDF/A, an open standard built on top of Adobe’s PDF format that includes everything the file needs in order to be opened, including the fonts used in the document. Microsoft Office, LibreOffice, and Adobe Acrobat all support exporting to PDF/A, meaning it’s relatively easy to make such a file. Stuchell recommends that you archive any document that you want to keep to that format.

“PDF/A is an open standard,” says Stuchell. This principle can apply to all of your documents. An Excel spreadsheet that opens fine now might not open in 20 years, but if you export that spreadsheet as a CSV file—which is essentially just a text document that other spreadsheet applications can understand—you can be sure that the file will be openable for decades to come.

Basically, if a file on your computer can only be opened by a specific piece of software, and that software is controlled by a single company, you should probably export it to an open format. It’s the only way to future-proof it.

Keep Photos and Videos Fresh

There’s a lot less to worry about when it comes to photos, Stuchell says, because we’ve been using the same file formats—JPEGs, PNGs, and TIFF files—for a long time. All of those file types are open formats you can open with a wide variety of software.

But that doesn’t mean all of your photos are future-proof. For one thing, if you tend to edit photos a lot, you might lose quality over time.

“JPEGs aren’t bad, it’s just that every time you edit and save, it will throw out a little bit more information,” Stuchell says. This effect is called generation loss. “One or two edits won’t be detectable, but keep this in mind: If you’re going to edit it a lot, make a copy each time and edit the copy.”

And keep in mind that some photos, especially RAW files from your camera, might be in a proprietary format.

“You have to be careful because many cameras default to their own version of RAW, which is highly proprietary,” Stuchell says. He recommends exporting such photos to an open format called Digital Negative (DNG), which is a safer format to use for preserving RAW files.

Videos also aren’t too much of a problem—most video files are encoded using open file formats at this point. But as with photos, Stuchell advises that you not attempt to edit a video file multiple times. Just edit a copy instead.

“That’s the great thing about digital: you can make a million copies,” Stuchell says.

Back Up Absolutely Everything

Converting all of your precious files to open formats affords you absolutely zero benefits if those files are lost. That’s why Stuchell emphasizes, multiple times, the importance of backing up your files. Ideally you would have three copies of every file—and keep one of those copies off-site. He mentioned the automatic backup services Backblaze and Crashplan as good tools for the job; we recommend combining Backblaze with a local backup.

The specific backup system you use doesn’t matter nearly as much as having some kind of backup strategy in place.

Facebook
Twitter
LinkedIn
Telegram
Tumblr