Community-approved digitization standards

File obsolescence, i.e. the inability to open an old digital file, is not only caused by the decay of its physical support: the file format itself used to create the file may have become obsolete, and there may be no current software able to read it. If the file was created by a commercial application, a version of the software able to read it may no longer exist for purchase.

The digital preservation community has compiled a list of file formats that are deemed "preservation-worthy", which means they are formats foreseen to be supported by IT systems for a long time to come. When scanning images, acquiring video, creating text documents, etc., one should strive to save the file in one of those formats. If the file was created by a third party, one should find a preservation-worthy format that the original file can be converted to without loss of information.

In general, a lossless format (e.g., TIFF rather than JPEG for images) should always be used. Also, an open format maintained by a community should always be preferred to close-sourced formats or open ones maintained by a single commercial entity (e.g., LibreOffice ODT text format rather than Microsoft Word DOC, TIFF rather than Photoshop PSD).

Many major libraries publish their criteria for selecting primary and secondary file formats when saving for preservation. Duke University offers an example that is both concise and comprehensive.

The US Library of Congress has a detailed reference for digitization standards related to individual medium formats.

Most of these guidelines have been conceived with large and well-funded institutions in mind, and some may be revised for the use cases that Pocket Archive is built for. For example, scanning a text document at a resolution that is just enough to make the text clearly legible may be the only option that one can afford if there are a lot of books to scan and not a lot of space to store them.

Also, "best" practices advise to preserve at least an archival master and a production master file for every digital capture. The differences between these two are subtle (but meaningful) and in some cases the two roles can be fulfilled by one file. In some scenarios, that can cut the storage size for some file formats in half.

Some times, the only available digital file is a lossy (JPEG, MP3) file and the original is not accessible for a better digital capture. In that case, the best available file should be treated as the master file, even if of sub-standard quality.

Pocket Archive does not make a judgment on the quality of the input data. These decisions are made prior to the archiving step and Pocket Archive will happily accept any files you give it.