Submission guide
Terms appearing in bold are referenced in the glossary.
Archival process overview
Pocket Archive receives new contents, and updates to existing contents, via submissions. A submission is an individual contribution to the archive that can add, update, or delete resources (a combination of any of these). A submission may include multiple resources, which can be related but do not necessarily have to.

- Archivist selects and lays out digital resources to be archived in his or her own workstation.
- Archivist creates a laundry list that includes an inventory of the resources and their metadata. This, together with the files and folders previously prepares, constitutes the SIP.
- Archivist transfers the SIP to the Drop box: first the files and folders, then the laundry list.
- Upon receipt of the laundry list, Pocket Archive processes the incoming materials and archives them.
- Pocket Archive generates a report after the process is complete (regardless of whether it was successful or failed).
- Depending on setup, Pocket Archive may delete the SIP from the Drop box if the submission succeeded.
- Depending on setup, Pocket Archive may (re-)generate the presentation.
- If the archivist wants to update the archived resources, they can either request a full copy of the SIP, (or to only update metadata, only the laundry list), edit it and/or replace files, and re-submit the new SIP; or edit individual resources via the admin interface.
- The archivist can remove a resource and, optionally, all its members at any time.
Processing of the SIP (point 4 above) either succeeds or fails as a whole. This means that a submission will never perform only a part of the task that it is meant to complete. This is called an atomic operation and it is designed to ensure consistency of the data.
Individual steps are described in detail in the following chapters.
Submission Information Package structure
A submission is performed by preparing a Submission Information Package, or SIP, which consists of data, i.e. files optionally arranged in a curator-defined folder hierarchy, and metadata, the latter gathered in a single file called a laundry list; and sending them both to Pocket Archive for processing.
A working laundry list example linking to local files, used for testing, is available as a quick reference. Other examples are illustrated further down in this document.
As the above life cycle chart shows, the SIP is a disposable asset. Once it is successfully archived, it can be deleted. The full SIP can be regenerated by the archive and retrieved at a later time.
The original files in the archivist's workstation can be optionally kept and/or copied to local storage. This is strongly recommended, at least until Pocket Archive reaches a stable status and can be exclusively relied on for long-term preservation. More copies means more chances to recover data from corruption or loss, but it also means higher storage costs.
Source file & folder layout
Preparation of the SIP begins with selecting the materials to submit. Generally, it is good practice to select a group of works more or less related to one another, e.g. a small coherent collection, or a day's work within a large collection that may take long to complete. It is not critical to get this part perfectly right, as more can be added to the archive at a later time. It is more important to keep submissions not too large, as a single malformed element can cause the whole submission to fail, and not too small, to avoid too many iterations that can become confusing. Submissions of tens to hundreds of files are in a quite safe range.
The arrangement of files and folders is important, the ordering of elements in a folder is less so. A file or sub-folder inside a parent folder creates a membership relationship between the two, so that, e.g. one can create the following structure:
my_collection
|
`- work1
| |
| `- file01.tiff
| |
| `- file02.pdf
|
`- work2
|
`- file3.mpg
This creates a collection, my_collection, with two members, work1 and
work2, the former containing file1.tiff and file2.pdf, and the latter
containing file3.mpg.
Ordering of the files or folder in a SIP is defined in the laundry list, as we will see further down, so using file names to force a certain order is not necessary (however it can provide a good starting point for large lists of files or folders under a parent).
Some file and folder structure may used in future versions of Pocket Archive to create more metadata.
Empty folders can be created and submitted: they can be used as placeholders for resources that have no files directly related. But the same effect can be obtained by other means with the laundry list.
Laundry list
Once the files to be included in the SIP is completed, a laundry list is compiled. This is basically, as the name suggests, an inventory of all the resources that go into the submission; but it provides much more information than that, by defining metadata and relationships between resources.
The laundry list is a CSV file and may be edited in any application that
supports CSV reading and writing. Care must be taken to export the file to CSV.
In LibreOffice, for example, "Save" writes the file as .odt format, which is
not usable as a laundry list. The spreadsheet must be instead exported as a
.csv format.
The use of LibreOffice is the preferred method for editing laundry lists, and this project may provide additional utilities specific to LibreOffice to facilitate laundry list authoring.
Examples of laundry lists are in the
test
directory of the Pocket Archive code. Most of these are tiny examples with
little interesting content, but highlighting specific features. Be aware of
invalid laundry lists (the pkar_submission-bad-… ones) and ones meant for
updates only (the pkar_submission-update-… ones) that may be invalid without
a prior submission.
Multi-sheet documents
Many spreadsheet applications allow grouping multiple tables or sheets in one file. CSV supports only one table per file. While some may find it convenient to keep multiple laundry lists in one spreadsheet file, one must take care of exporting each sheet individually as a CSV.
Laundry list format
The first row of a laundry list is reserved for the header, which indicates the
field names. These can be in any order, but following a specific order is
recommended. The order used in this document and in all laundry lists
automatically generated by Pocket Archive is: content_type, id,
source_path, and then all ordinary fields in alphabetical order.
Each subsequent row represents a resource (except in a multi-value case,
described below). The content_type field is mandatory for each resource.
Except for the exceptions noted in the ""Fields with special meaning" below, all fields are optional for the submission to be considered well-formed. However, some schema definitions may have constraints in this regard and may be at least strongly recommended. This depends on the content model used. A submission will still fail if it is well-formed but does not respond to some mandatory schema constraints.
Fields with a special meaning
content_type
Mandatory, single-valued.
It defines the content type assigned to the resource. For files, it must be
file or a sub-type thereof, except for inferred resources (see below). For
folders it must not be a file or sub-type. Consult the content model of your
archive for a list of defined type names.
id
Mandatory for resources being updated, single-valued.
For new resources it becomes the primary identifier, which is used anywhere information about the resource is retrieved.
The IDs generated by default by Pocket Archive are 16-character random strings containing only uppercase and lowercase letters and digits. The depositor is responsible for ensuring that the provided ID is unique across the system. If left blank on new resources, the system generates an identifier that is guaranteed to be unique. However, re-submitting the laundry list a second time with the same blank field will create a duplicate resource; therefore, it is recommended to always fill this field in.
source_path
Mandatory for new files, single-valued.
It refers to the file or folder path relative to the package, using forward
slash / characters to separate folders and subfolders or files. It can be
omitted for files being updated, and for folders (descriptive resources).
If it is present on a file update and the file exists in the SIP, the file is
used to replace the archived file. If it is present and different from the
archived file's path, and it does not correspond to a file in the SIP, Pocket
Archive will only update the file path in the archived file metadata. This path
is used when rebuilding the SIP from the archive.
has_member
This behaves like all normal properties, but it has a special meaning when
deleting resources. If the --members option is provided, resources linked via
the has_member property to the resource being deleted are also deleted, along
with their own members, recursively. See the "Deleting resources" section
below.
Note: when a field is defined as "mandatory" above, this is intended per-resource. If the resource spans multiple rows, as when it has multi-valued fields, a mandatory field is only required to have a value on the first row of the resource.
Example of a table representing a work with two files:
| content_type | id | source_path | creation_date | label |
|---|---|---|---|---|
| still_image | Sg9hYIISjRjlkP62 | my_collection/work1 | 12-07-2002 | My first deposited work |
| still_image_file | 7hic19YTXA8Fudxo | my_collection/work1/file1.tiff | 09-22-2025 | |
| still_image_file | Z509TdNhpTjPYDS4 | my_collection/work1/file2.pdf | 09-23-2025 |
Note the difference between the still_image and the still_image_file
resources. We will get back to it further down.
Multi-valued fields
Some fields may allow multiple values. To provide multiple values for one or
more fields, additional values are added to rows below the previous. For these
additional rows, the special fields content_type, id, and source_path
must not be filled.
Example of a table with a single resource with multi-valued fields:
| content_type | id | source_path | alt_label | description | label |
|---|---|---|---|---|---|
| still_image | Sg9hYIISjRjlkP62 | my_collection/work1 | An alternative label | A description of the work goes here. | This is the title and must have only one value. |
| You can have as many as you like of these | Another description goes here. | ||||
| FREE alt labels! (as long as supplies last) |
The submission process checks if the content_type field is filled in a cell
to determine whether a row in the table is a continuation from the previous
one, adding multiple values. Having a row without content type and with id
and/or source_path is considered an error.
Ordering
The ordering of rows in a laundry list determines the ordering of the resources
in their container. The system automatically assigns an order to the resources,
using their source path and their position in the laundry list. Resources at
the top level, i.e. directly under the SIP folder, are not assigned an order, as
they are considered self-standing. If an order is needed for those, the
next property can be set to the desired resource (see point below
about relationships), or they can be put in an enclosing folder that acts as a
collection.
Relationships can be established between resources. These are stored as persistent links and appear as hyperlinks in the discovery interface. A relationship can only be set for a field that is configured as "resource" type. Consult your content model to find which properties are relationships.
To set a relationship with a resource in the same laundry list that doesn't have an explicit ID set, insert the source path of the resource. For a resource that has already an ID, either by being assigned one manually or by being already deposited, insert the ID string.
Example table with implicit and explicit relationships, some path-based and some ID-based:
| content_type | id | source_path | has_member | label |
|---|---|---|---|---|
| collection | p9tXQGBb9iC7xEqm | my_collection-1 | This collection has implicit members from the folder hierarchy. | |
| still_image | KHwYidw4R7xUAEMN | my_collection-2/image001 | Resource with an explicit ID. The ID can be used in a reference. | |
| text | my_collection-2/text0001 | Resource without explicit ID. It can be referenced by source_path. | ||
| collection | EUXRg9igmU9ouzVH | my_collection-2 | p9tXQGBb9iC7xEqm | This collection has explicit member relationships. |
| my_collection-2/text0001 |
When the laundry list is processed for submission, the path-based references are replaced with IDs, which are automatically generated where not provided. Therefore, a laundry list generated from archived resources may look different from the original one. The generated laundry list should be used for re-submission.
Resource types and sub-types
This section is a very concise introduction to content modeling in Pocket Archive, which is treated in detail in the Content modeling introduction. It is strongly recommended to read that guide before archiving resources in earnest.
The three main resource types found in a submission are: Work, File, and Brick. See the Content modeling introduction for more information about these.
These three key content types are seldom used as-is. They usually have sub-types, which are defined in the content model. See the content modeling guide for more information about sub-types.
Types provided by Pocket Archive may have similar names but different uses. For
example, the still_image type, a sub-type of work, designates a visual
object, e.g. a photograph. still_image_file may be the capture (e.g. scan) of
that object, but also the capture of a text work if it is the scan of a
book page.
See the provided sample laundry list for examples of works, files, and bricks making up a two-sided postcard.
Submission ID and submission name
Each submission gets a randomly generated ID when it starts. This ID is attached to all the resources in the submission. This makes it easier to find out later on when and how a certain resource was submitted. It also makes it possible to generate a laundry list that contains all the resources of the original submission.
The ID is automatically generated and system-controlled. It cannot be changed.
A submission can also have a name, which is optional and user-defined. The
submission name is determined by the file name used for the laundry list. E.g.
pkar_submission-my_new_collection.csv will use my_new_collection, i.e. the
text between pkar_submission- and .csv, as the submission name. Submission
names are not required to be unique. Of course, the laundry list file names
must be unique in the drop box they are deposited to.
Updating resources
A submission is also used to update existing resources. Each resource update is a full replacement of all the resource's metadata, so a submission must include a full representation of each of the resources updated.
Any single submission can contain a mix of new and updated resources. If the correct fields are provided (see "Fields with special meaning" above), Pocket Archive will know which is which.
To facilitate this task while avoiding the need to hold on to all of the archive's laundry lists, Pocket Archive can generate a laundry list for one or more selected resources. This list, which represents the current state of the resources requested, can be edited and re-submitted for an update. Read the Admin interface document for further information.
The administrative interface, if enabled, has also a facility to inspect and update an individual resource, because performing such one-off tasks using only submissions via laundry lists could become a tedious job.
Deleting resources
Although some archivists advise against deleting anything from an archive, Pocket Archive acknowledges that in real life things may actually need to be removed. The cause may be a duplicate, or something that was not supposed to be archived, etc. In any case, the resource-conservative alignment of Pocket Archive supports deleting resources immediately and irreversibly. Versioning and "soft" deletion, which keep prior states of resources including deleted ones, is not supported.
A resource can be deleted via the pkar remove CLI method, or by uploading a
special file to the drop box, named pkar_remove* (asterisk means zero or
more characters—note that the file name does not need an extension). The delete
file must be a list of archival IDs, in the short URI form (par:<ID>),
one per line.
If pkar_watch, the process watching the drop box, was started with the -r
option, all members of the resources are recursively deleted (this means also
members of members). This is set by the system administrator and is applied
to all deletions. It cannot be overridden for individual deletion requests.
Advanced techniques
Some hidden tricks can be employed to facilitate the creation and management of larger submissions.
Implicit resources
TODO
Bulk ID generation
As mentioned before, explicitly adding IDs in a laundry list simplifies later editing and management. However, this is one of the most tedious parts of a laundry list creation.
Fortunately, such repetitive and error-prone tasks can be easily automated with tools provided by most spreadsheet applications. A macro (a mini-program that runs in an application) for LibreOffice Calc is provided here to automatically generate 16-character IDs for all the cells selected in a table.