Content modeling manual
Note: content model configuration is in heavy flux, and some of this documentation may be incomplete and/or change frequently.
The content model governs all of Pocket Archive's content input and output. It validates incoming submissions, verifying that contents and their metadata are entered according to specific criteria, and provides additional information for Linked Data export and for the user-facing presentation.
For a generic introduction to content modeling in Pocket Archive, see the content modeling primer
Writing content model configuration
The suggested method for starting a new content model is to copy the
configuration files found in config/model/schema and adjust them to one's
needs. Advanced users can write a whole content model from scratch, using only
the core schema (see below) as the base.
Inheritance
Schema definitions follow an inheritance model, whereas a schema has a
broader attribute that defines its super-type, from which it inherits all
property definitions. Note that it doesn't inherit any top-level attributes,
that MUST be defined for, and are specific to, each content type.
A parent property may be redefined by adding or replacing attributes. All
values of a multi-valued attribute are replaced completely. If, for example,
the my_work schema has a property has_file of type resource with
range of {image_file = true}; if a sub-type of work, e.g.,
my_other_work wants to add text_file to the range of has_file, it
should specify {image_file = true, text_file = true} in the override
definition.
Core schema and predefined content types
The content model is defined by configuration files in the schema folder.
Each file defines one content type.
Some content types are considered part of the core functionality of Pocket
Archive. These are defined by the configurations in core_schema which comes
with the Pocket Archive installation and cannot be altered. These core
types include the foundational types, such as resource, work, file,
etc.
The core types are extensible in the user configuration by creating sub-types. All user-defined schemas must have file names that do not conflict with the core type names.
Pocket Archive ships with a sample configuration with commonly used content types. For some very simple archives, this may be enough to get started with little or no customization. For a setup which needs to define more numerous or complex content types in a more articulated way, additional types can be defined. Please look at the default model configuration files that come with Pocket Archive.
Each type definition is encoded in a configuration file defining a single
content category type. One doesn't have to define all possible types in detail.
Pocket Archive provides some basic types, e.g.: Resource (the super-class of
them all), Work, File, Brick. To add more specific definitions,
subtypes can be defined. A subtype inherits all the property definitions of
its broader model, and adds more specific behavior. An example classification
could be: Resource → File → Image File → Scientific Image. Each of the
sub-types would only define the special properties of that definition, which
add to, or replace, the properties of its broader definitions.
All resources in Pocket Archive must be assigned a content type. If someone has to deal with a resource that doesn't fit in any of the predefined content models, they can asign it the most specific type that they can. At worst, they can put it under Resource. Of course, if one starts dealing with many unclassifiable resources that look similar, it's probably best to define a content type for them; but that is not mandatory.
In addition to these three mandatory components, a property can have an optional description. This, when present, shows in the presentation as an "info" icon and a pop-up with a concise explanation of the property's scope and purpose.
Constraints
Each metadata field can be specified by constraints. These constraints can be on:
- Type: the data type for the field, e.g. string, number, resource (relationship), etc.
- Cardinality: how many values can be set for a field, for each resource. These values can be adjusted to set mandatory fields, single-valued fields, etc.
- Range: the range of values allowed. How this is interpreted depends on the data type.
It is advisable to keep constraints, ehm, constrained. An excessively prescriptive content model can become frustrating and even unusable for users.
Detailed definition of constraints in the configuration follows below.
Property definitions
Properties SHOULD be defined as accurately as practically possible. A good content model configuration (which means, carefully organized, neither too strict nor too vague, and exhaustively annotated with labels and descriptions) can greatly improve the quality of the archive data while facilitating the archivist's job.
Currently, Pocket Archive will accept any properties with undefined code
names in a submission. These will be given a label equal to their code name
and a URI of par:[code name]. Note that these properties may be hard to
understand by users browsing the discovery interface. Pocket Archive may in the
future allow disabling the entry of undefined properties by configuration, or
forbid them altogether in the application framework.
Each property has the following attributes:
[property_id].uri
Mandatory: yes
Type: string
URI of the property. This SHOULD be unique within the content model. It MUST
be a namespace-prefixed string using prefixes from the configured namespace
map — e.g., pas:contentType.
[property_id].label
Mandatory: yes
Type: string
Human-readable label used for display in presentation and other user-facing platforms.
[property_id].description
Mandatory: no
Type: string
Default: (none)
Longer description of the property. It is displayed as a tooltip text in the presentation. This may be useful for both catalogers and end users to better understand the scope and purpose of the property.
[property_id].type
Mandatory: no
Type: string
Default: "string"
Data type for the property. It MAY be one of:
string: a generic string. This is the default if not defined.url: a string that will display as a link in the presentation. Pocket Archive does not check that the URL is well-formed or that it points to an existing resource.integer: an integer number.decimal: a number with a limited decimal part.float: a floating-point number.datetime: a date value, optionally including time. It SHOULD be in the ISO8601 format.resource: an internal Pocket Archive resource. This behaves similarly tourl, except that the value MUST be a UID in the same archive, and the consistency of the link is always checked.structured: a content type that allows a more complex data structure. [WIP note: not yet implemented]
Each data type may be mapped to an RDF data type in the configuration, under
md.datatypes. E.g., in the default configuration:
datatypes = {
integer = "xsd:integer",
decimal = "xsd:decimal",
float = "xsd:double",
boolean = "xsd:boolean",
datetime = "xsd:datetime",
url = "xsd:anyURI",
}
Unmapped data types map to xsd:string.
New data types may be defined in the local model and mapped to specific RDF types. Pocket Archive treats them as plain strings.
[property_id].max_cardinality
Mandatory: no
Type: integer
Default: (unbounded)
Maximum number of values that a property can have on any single resource. A value of 1 means that the property is single-valued.
[property_id].min_cardinality
Mandatory: no
Type: integer
Default: 0
Minimum number of values that a property must have on any single resource. A value of 1 means that the property is mandatory.
[property_id].range
Mandatory: no
Type: (variable)
Default: (none)
[WIP note: Not implemented yet] Range of accepted values. This has
different interpretations depending on the data type of the property: for a
string it is a regular expression string that the input values must match; for
a number, a min/max range in the form of a table: {min, max}; for a
relationship, a set (table) with one or more content type IDs as keys
representing allowed content types of the relationship target, e.g.,
{still_image_file = true, document_file = true}.
[property_id].flags
Mandatory: no
Type: bitmask
Default: 0 (no flags)
Special handling flags for the application. The value of this attribute is a bitmask of the following constants:
PROP_PROTECTED: the property is entirely system managed and cannot be modified by the cataloger. An example of this islast_modified.PROP_NO_UPDATE: the property can be set on resource creation, but it won't be changed afterwards, by user or system. An example of this iscontent_type.PROP_NO_DELETE: properties are normally cleared of pre-existing values before applying an update, so that the update completely overwrites all the properties and their values. However, a property that has this flag set will never have its previous values cleared, and these will accumulate with subsequent updates. An example of this issub_id.PROP_NO_MDLIST: the property will not be displayed in the default metadata list in the presentation pages. It is still passed to the template and used by custom page templates.PROP_INDEX: include this property in the search index and in the list of search terms.PROP_MULTILANG: a property that can have values in multiple languages. The property type MUST bestring. The string is entered in the laundry list with a language tag suffix. NOTE: this disables the maximum cardinality constraint check. To indicate a "default" value for properties that are expected to be single-valued, e.g. a work's label when displayed as a title, enter one value without the language tag. This is also a standard Linked Data way to request the default representation of a string.