Content model setup
The content model governs all of Pocket Archive's content input and output. It validates incoming submissions, verifying that contents and their metadata are entered according to specific criteria, and provides additional information for Linked Data export and for the user-facing presentation.
For a generic introduction to content modeling in Pocket Archive, see the content modeling guidelines.
Writing content model configuration
The suggested method for starting a new content model is to copy the
configuration files found in config/model/schema and adjust them to one's
needs. Advanced users can write a whole content model from scratch, using only
the core schema (see below) as the base.
Inheritance
Schema definitions follow an inheritance model, whereas a schema has a
broader attribute that defines its super-type, from which it inherits all
property definitions. Note that it doesn't inherit any top-level attributes,
that MUST be defined for, and are specific to, each content type.
A parent property may be redefined by adding or replacing attributes. All
values of a multi-valued attribute are replaced completely. If, for example,
the my_work schema has a property has_file of type resource with
range of {image_file = true}; if a sub-type of work, e.g.,
my_other_work wants to add text_file to the range of has_file, it
should specify {image_file = true, text_file = true} in the override
definition.
Core schema and predefined content types
The content model is defined by configuration files in the schema folder.
Each file defines one content type.
Some content types are considered part of the core functionality of Pocket
Archive. These are defined by the configurations in core_schema which comes
with the Pocket Archive installation and cannot be altered. These core
types include the foundational types, such as resource, work, file,
etc.
The core types are extensible in the user configuration by creating sub-types. All user-defined schemas must have file names that do not conflict with the core type names.
Pocket Archive ships with a sample configuration with commonly used content types. For some very simple archives, this may be enough to get started with little or no customization. For a setup which needs to define more numerous or complex content types in a more articulated way, additional types can be defined. Please look at the default model configuration files that come with Pocket Archive.
Each type definition is encoded in a configuration file defining a single
content category type. One doesn't have to define all possible types in detail.
Pocket Archive provides some basic types, i.e.: resource (the super-class of
them all), container, file, collection. To add more specific definitions,
subtypes can be defined. A subtype inherits all the property definitions of
its broader model, and adds more specific behavior. An example classification
could be: Resource → File → Image File → Scientific Image. Each of the
sub-types would only define the special properties of that definition, which
add to, or replace, the properties of its broader definitions.
All resources in Pocket Archive must be assigned a content type. If someone has to deal with a resource that doesn't fit in any of the predefined content models, they can assign it the most specific type that they can. At worst, they can put it under Resource. Of course, if one starts dealing with many unclassifiable resources that look similar, it's probably best to define a content type for them; but that is not mandatory.
In addition to these three mandatory attributes, a property can have an optional description. This, when present, shows in the presentation as an "info" icon and a pop-up with a concise explanation of the property's scope and purpose.
Constraints
Each property can be assigned constraints on:
- Type: the data type for the field, e.g. string, number, resource (relationship), etc.
- Cardinality: how many values can be set for a field, for each resource. These values can be adjusted to set mandatory fields, single-valued fields, etc.
- Range: the range of values allowed. How this is interpreted depends on the data type.
It is advisable to keep constraints, ahem, constrained. An excessively prescriptive content model can become frustrating and even unusable for users.
Detailed definitions of property constraints are in the Property definitions section.
The schema configuration file
A schema definition file MUST be placed inside $PKAR_CONFIG_DIR/schema/ and
named with the schema codename plus the .lua suffix.
Every schema definition MUST end with a return {...} statement, which
includes the complete configuration of that schema.
{...} is a Lua table, containing some mandatory and some optional keys as
defined below:
uri
Mandatory: yes
Type: string
This is the URI used to identify the content type ("class")in RDF. An RDF
convention is to use an upper-camel cased notation for these URIs (e.g.,
pas:MyNewType), but if one adopts an existing vocabulary, the naming scheme
for that vocabulary should be followed.
label
Mandatory: yes
Type: string
This is the human-readable label that appears in the presentation and in other human-readable contexts. It should be a clear and very concise title for what the type represents.
description
Mandatory: no
Type: string
Optional, more extended description about the scope of the content type.
This field is displayed in the presentation and should only contain information
for the end user. For notes to the catalogers, use the notes field.
notes
Mandatory: no
Type: Sequence (table) of strings
Optional notes for the content managers.
broader
Mandatory: yes
Type: string
Codename of the content type that this type inherits from. Creating a type that is not a sub-type of anything (i.e., a top-level type) is not allowed.
properties
Mandatory: no
Type: table
List of properties specific to this type and its sub-types. See Property definitions.
default_fmodel
Mandatory: no
Type: string
Applicable to: container
This determines the content type of automatically generated file
members. If not specified, the generic
file is used.
gen
Mandatory: no
Type: table
Applicable to: file
Generator options. This section is used to define the transform processes that generate presentation derivatives and thumbnails for this type of file. See generator definitions.
Property definitions
Properties should be defined as accurately as practically possible. A good content model configuration (which means, carefully organized, neither too strict nor too vague, and exhaustively annotated with labels and descriptions) can greatly improve the quality of the archive data while facilitating the archivist's job.
Pocket Archive will not accept any properties with undefined codenames in a submission.
Each property has the following attributes:
[property_id].uri
Mandatory: yes
Type: string
URI of the property. This SHOULD be unique within the content model. It MUST
be a namespace-prefixed string using prefixes from the configured namespace
map — e.g., pas:contentType. Content model designers
can choose the ontologies they want to use for their
naming strategy.
[property_id].label
Mandatory: yes
Type: string
Human-readable label used for display in presentation and other user-facing platforms.
[property_id].description
Mandatory: no
Type: string
Default: (none)
Longer description of the property. It is displayed as a tooltip text in the presentation. This may be useful for end users to better understand the scope and purpose of the property.
This field should only contain information for the end user. For notes to the
catalogers, use the notes field.
[property_id].notes
Mandatory: no
Type: Sequence (table) of strings
Default: (none)
Optional notes for the content managers.
[property_id].type
Mandatory: no
Type: string
Default: "string"
Data type for the property. It MAY be one of:
string: a generic string. This is the default if not defined.url: a string that will display as a link in the presentation. Pocket Archive does not check that the URL is well-formed or that it points to an existing resource.integer: an integer number.decimal: a number with a limited decimal part.float: a floating-point number.date: a date value in theYYYY-MM-DDformat.datetime: a date value, optionally including time. It SHOULD be in the ISO8601 format.timestamp: technically an integer number representing the second passed since the Unix Epoch. Used mainly for system-managed values.resource: an internal Pocket Archive resource. This behaves similarly tourl, except that the value MUST be a UID in the same archive, and the consistency of the link is always checked.structured: a type that allows a more complex data structure. [WIP note: not yet implemented]
Each data type may be mapped to an RDF data type in the configuration, under
md.datatypes. E.g., in the default configuration:
datatypes = {
integer = "xsd:integer",
decimal = "xsd:decimal",
float = "xsd:double",
boolean = "xsd:boolean",
datetime = "xsd:datetime",
url = "xsd:anyURI",
}
Unmapped data types map to xsd:string. String data types MAY be
multi-lingual (see the "flags" section below).
New data types may be defined in the local model and mapped to specific RDF types. Pocket Archive treats them as plain strings.
[property_id].max_cardinality
Mandatory: no
Type: integer
Default: (unbounded)
Maximum number of values that a property can have on any single resource. A value of 1 means that the property is single-valued.
[property_id].min_cardinality
Mandatory: no
Type: integer
Default: 0
Minimum number of values that a property must have on any single resource. A value of 1 means that the property is mandatory.
[property_id].range
Mandatory: no
Type: (variable)
Default: (none)
[WIP note: Not implemented yet] Range of accepted values. This has
different interpretations depending on the data type of the property: for a
string it is a regular expression string that the input values must match; for
a number, a min/max range in the form of a table: {min, max}; for a
relationship, a set (table) with one or more content type IDs as keys
representing allowed content types of the relationship target, e.g.,
{still_image_file = true, document_file = true}.
[property_id].flags
Mandatory: no
Type: bitmask
Default: 0 (no flags)
Special handling flags for the application. The value of this attribute is a bitmask of the following constants:
PROP_PROTECTED: the property is entirely system managed and cannot be modified by the cataloger. An example of this islast_modified. This flag SHOULD NOT be set in user-defined schemata.PROP_NO_UPDATE: the property can be set on resource creation, but it won't be changed afterwards, by user or system. An example of this iscontent_type.PROP_NO_DELETE: properties are normally cleared of all pre-existing values before applying an update, so that the update completely overwrites all the properties and their values. However, a property that has this flag set will never have its previous values cleared, and these will accumulate with subsequent updates. An example of this issub_id.PROP_NO_MDLIST: the property will not be displayed in the default metadata list in the presentation pages. It is still passed to the template and can be used by custom page templates.PROP_INDEX: include this property in the search index and in the list of search terms.PROP_MULTILANG: a property that can have values in multiple languages. The property type MUST bestring. The string is entered in the laundry list with or without a language tag suffix. NOTE: this disables the maximum cardinality constraint check. To indicate a "default" value for properties that are expected to be single-valued, e.g. a work's label when displayed as a title, enter one value without the language tag. This is also a standard Linked Data way to request the default representation of a string.
Generator definitions
TODO Generators are still experimental. More information will be added when they are stabilized.