Content model setup

The content model governs all of Pocket Archive's content input and output. It validates incoming submissions, verifying that contents and their metadata are entered according to specific criteria, and provides additional information for Linked Data export and for the user-facing presentation.

For a generic introduction to content modeling in Pocket Archive, see the content modeling guidelines.

Writing content model configuration

The suggested method for starting a new content model is to copy the configuration files found in config/model/schema and adjust them to one's needs. Advanced users can write a whole content model from scratch, using only the core schema (see below) as the base.

Inheritance

Schema definitions follow an inheritance model, whereas a schema has a broader attribute that defines its super-type, from which it inherits all property definitions. Note that it doesn't inherit any top-level attributes, that MUST be defined for, and are specific to, each content type.

A parent property may be redefined by adding or replacing attributes. All values of a multi-valued attribute are replaced completely. If, for example, the my_work schema has a property has_file of type resource with range of {image_file = true}; if a sub-type of work, e.g., my_other_work wants to add text_file to the range of has_file, it should specify {image_file = true, text_file = true} in the override definition.

Core schema and predefined content types

The content model is defined by configuration files in the schema folder. Each file defines one content type.

Some content types are considered part of the core functionality of Pocket Archive. These are defined by the configurations in core_schema which comes with the Pocket Archive installation and cannot be altered. These core types include the foundational types, such as resource, work, file, etc.

The core types are extensible in the user configuration by creating sub-types. All user-defined schemas must have file names that do not conflict with the core type names.

Pocket Archive ships with a sample configuration with commonly used content types. For some very simple archives, this may be enough to get started with little or no customization. For a setup which needs to define more numerous or complex content types in a more articulated way, additional types can be defined. Please look at the default model configuration files that come with Pocket Archive.

Each type definition is encoded in a configuration file defining a single content category type. One doesn't have to define all possible types in detail. Pocket Archive provides some basic types, i.e.: resource (the super-class of them all), container, file, collection. To add more specific definitions, subtypes can be defined. A subtype inherits all the property definitions of its broader model, and adds more specific behavior. An example classification could be: Resource → File → Image File → Scientific Image. Each of the sub-types would only define the special properties of that definition, which add to, or replace, the properties of its broader definitions.

All resources in Pocket Archive must be assigned a content type. If someone has to deal with a resource that doesn't fit in any of the predefined content models, they can assign it the most specific type that they can. At worst, they can put it under Resource. Of course, if one starts dealing with many unclassifiable resources that look similar, it's probably best to define a content type for them; but that is not mandatory.

In addition to these three mandatory attributes, a property can have an optional description. This, when present, shows in the presentation as an "info" icon and a pop-up with a concise explanation of the property's scope and purpose.

Constraints

Each property can be assigned constraints on:

Type: the data type for the field, e.g. string, number, resource (relationship), etc.
Cardinality: how many values can be set for a field, for each resource. These values can be adjusted to set mandatory fields, single-valued fields, etc.
Range: the range of values allowed. How this is interpreted depends on the data type.

It is advisable to keep constraints, ahem, constrained. An excessively prescriptive content model can become frustrating and even unusable for users.

Detailed definitions of property constraints are in the Property definitions section.

The schema configuration file

A schema definition file MUST be placed inside $PKAR_CONFIG_DIR/schema/ and named with the schema codename plus the .lua suffix.

Every schema definition MUST end with a return {...} statement, which includes the complete configuration of that schema.

{...} is a Lua table, containing some mandatory and some optional keys as defined below:

`uri`

Mandatory: yes
Type: string

This is the URI used to identify the content type ("class")in RDF. An RDF convention is to use an upper-camel cased notation for these URIs (e.g., pas:MyNewType), but if one adopts an existing vocabulary, the naming scheme for that vocabulary should be followed.

`label`

Mandatory: yes
Type: string

This is the human-readable label that appears in the presentation and in other human-readable contexts. It should be a clear and very concise title for what the type represents.

`description`

Mandatory: no
Type: string

Optional, more extended description about the scope of the content type.

This field is displayed in the presentation and should only contain information for the end user. For notes to the catalogers, use the notes field.

`notes`

Mandatory: no
Type: Sequence (table) of strings

Optional notes for the content managers.

`broader`

Mandatory: yes
Type: string

Codename of the content type that this type inherits from. Creating a type that is not a sub-type of anything (i.e., a top-level type) is not allowed.

`properties`

Mandatory: no
Type: table

List of properties specific to this type and its sub-types. See Property definitions.

`default_fmodel`

Mandatory: no
Type: string Applicable to: container

This determines the content type of automatically generated file members. If not specified, the generic file is used.

`gen`

Mandatory: no
Type: table
Applicable to: file

Generator options. This section is used to define the transform processes that generate presentation derivatives and thumbnails for this type of file. See generator definitions.

Property definitions

Properties should be defined as accurately as practically possible. A good content model configuration (which means, carefully organized, neither too strict nor too vague, and exhaustively annotated with labels and descriptions) can greatly improve the quality of the archive data while facilitating the archivist's job.

Pocket Archive will not accept any properties with undefined codenames in a submission.

Each property has the following attributes:

`[property_id].uri`

Mandatory: yes
Type: string

URI of the property. This SHOULD be unique within the content model. It MUST be a namespace-prefixed string using prefixes from the configured namespace map — e.g., pas:contentType. Content model designers can choose the ontologies they want to use for their naming strategy.

`[property_id].label`

Mandatory: yes
Type: string

Human-readable label used for display in presentation and other user-facing platforms.

`[property_id].description`

Mandatory: no
Type: string
Default: (none)

Longer description of the property. It is displayed as a tooltip text in the presentation. This may be useful for end users to better understand the scope and purpose of the property.

This field should only contain information for the end user. For notes to the catalogers, use the notes field.

`[property_id].notes`

Mandatory: no
Type: Sequence (table) of strings Default: (none)

Optional notes for the content managers.

`[property_id].type`

Mandatory: no
Type: string
Default: "string"

Data type for the property. It MAY be one of:

string: a generic string. This is the default if not defined.
url: a string that will display as a link in the presentation. Pocket Archive does not check that the URL is well-formed or that it points to an existing resource.
integer: an integer number.
decimal: a number with a limited decimal part.
float: a floating-point number.
date: a date value in the YYYY-MM-DD format.
datetime: a date value, optionally including time. It SHOULD be in the ISO8601 format.
timestamp: technically an integer number representing the second passed since the Unix Epoch. Used mainly for system-managed values.
resource: an internal Pocket Archive resource. This behaves similarly to url, except that the value MUST be a UID in the same archive, and the consistency of the link is always checked.
structured: a type that allows a more complex data structure. [WIP note: not yet implemented]

Each data type may be mapped to an RDF data type in the configuration, under md.datatypes. E.g., in the default configuration:

datatypes = {
    integer = "xsd:integer",
    decimal = "xsd:decimal",
    float = "xsd:double",
    boolean = "xsd:boolean",
    datetime = "xsd:datetime",
    url = "xsd:anyURI",
}

Unmapped data types map to xsd:string. String data types MAY be multi-lingual (see the "flags" section below).

New data types may be defined in the local model and mapped to specific RDF types. Pocket Archive treats them as plain strings.

`[property_id].max_cardinality`

Mandatory: no
Type: integer
Default: (unbounded)

Maximum number of values that a property can have on any single resource. A value of 1 means that the property is single-valued.

`[property_id].min_cardinality`

Mandatory: no
Type: integer
Default: 0

Minimum number of values that a property must have on any single resource. A value of 1 means that the property is mandatory.

`[property_id].range`

Mandatory: no
Type: (variable)
Default: (none)

[WIP note: Not implemented yet] Range of accepted values. This has different interpretations depending on the data type of the property: for a string it is a regular expression string that the input values must match; for a number, a min/max range in the form of a table: {min, max}; for a relationship, a set (table) with one or more content type IDs as keys representing allowed content types of the relationship target, e.g., {still_image_file = true, document_file = true}.

`[property_id].flags`

Mandatory: no
Type: bitmask
Default: 0 (no flags)

Special handling flags for the application. The value of this attribute is a bitmask of the following constants:

PROP_PROTECTED: the property is entirely system managed and cannot be modified by the cataloger. An example of this is last_modified. This flag SHOULD NOT be set in user-defined schemata.
PROP_NO_UPDATE: the property can be set on resource creation, but it won't be changed afterwards, by user or system. An example of this is content_type.
PROP_NO_DELETE: properties are normally cleared of all pre-existing values before applying an update, so that the update completely overwrites all the properties and their values. However, a property that has this flag set will never have its previous values cleared, and these will accumulate with subsequent updates. An example of this is sub_id.
PROP_NO_MDLIST: the property will not be displayed in the default metadata list in the presentation pages. It is still passed to the template and can be used by custom page templates.
PROP_INDEX: include this property in the search index and in the list of search terms.
PROP_MULTILANG: a property that can have values in multiple languages. The property type MUST be string. The string is entered in the laundry list with or without a language tag suffix. NOTE: this disables the maximum cardinality constraint check. To indicate a "default" value for properties that are expected to be single-valued, e.g. a work's label when displayed as a title, enter one value without the language tag. This is also a standard Linked Data way to request the default representation of a string.

Generator definitions

TODO Generators are still experimental. More information will be added when they are stabilized.