Fixity checking
Fixity is the ability of a digital file to verify its own integrity from corruption by checking a checksum. This is a sequence of bytes, usually displayed as a string of hexadecimal characters, that represents the "fingerprint" unique to that file. This is an essential tool to detect and remediate file corruption, which can happen both in transit, and at rest.
In-transit corruption is far more frequent and happens when a file is moved—either within the same machine, or across the network. A protection against this problem is to calculate a checksum for the file as soon as it is created, and store this checksum somewhere safe. The checksum is then calculated again when the file is moved, for example, when it gets archived. If even one bit of the file has changed, the checksum will look totally different. This way, one can verify that a file was archived exactly as it was initially provided.
At-rest corruption is more rare but happens, mostly when the storage support gets corrupted. To check fixity of at-rest files means routinely scanning the whole archive to calculate all the checksums. It is a quite long operation, and some argue that a file is more prone to corruption when it is being accessed, in this case to calculate the checksum.
Pocket Archive calculates and stores checksums on deposit. This allows verifying fixity at rest or for the replicas. It cannot however calculate the fixity at the source, on the depositor's workstation. In a future version, Pocket Archive will provide the option for the depositor to calculate the checksum before deposit and providing that with the laundry list, so that the integrity of the file can be verified before archiving.
Pocket Archive also verifies the computed checksum on major file movements, such as moving deposited files from a temporary staging area to their archival location, and when creating a backup.