Static site generation

Terms appearing in bold are referenced in the glossary.

General remarks

A static site is a collection of HTML pages and other supporting assets that are pre-generated, in contrast with most modern dynamic sites found on the web, which generate contents on demand from users' browsing activity.

Dynamic websites have an overwhelming number of advantages over static ones, and are the only option when publishing voluminous and complex bodies of information. For the purpose of Pocket Archive, however, a static site is a better fit, as it deals with contents that change slowly, and, in most foreseen cases, of modes size. What "modest" means is still very roughly defined: with real-world usage, a more accurate estimate can be done on whether a static site is appropriate for a specific archive. Presently, it should be safe to estimate that Pocket Archive can generate a site of several thousands of resources within a reasonable time, on a modestly powerful machine.

Generation and access

A static site is generated once, and after that its files can be shared and viewed locally, or remotely. The page URLs generated by Pocket Archive are suitable for both scenarios without any modification to the generation options.

When viewed locally, the whole site can be accessed without the help of any specialized software—except for a browser, of course. The files are on the user's local disk and are pointed to by the browser using a file protocol URL, e.g. file:///home/user/my_pkar_site/index.html. "Local" does not mean, however, that the files must stay in the same place. They can be copied (best if archived with tar or zip when moving), and shared via network, portable flash drives, etc. This is very useful in scenarios where a reliable Internet connection is a challenge.

For broader distribution, a remote option is better. An HTTP server must be set up in this case. Nginx is a popular choice but quite oversized for static-only contents. Simpler options such as darkhttpd or Merecat are excellent static site servers with a tiny footprint. This is obviously a better solution for archives that change regularly, since an update on the site is immediately propagated to all the viewers.

If something is updated in the archive, the whole site must be regenerated. This might take a longer time on larger archives, but the goal is to keep the process very efficient, so it doesn't become a significant bottleneck. Shortcuts to regenerate only some resources, or a collection, or only the static assets may be added at a later time, however that may become complicated as resources can be inter-linked in several ways.

Static sites can only be generated locally, e.g. access to the pkar shell command on the machine hosting the archival data is necessary. There are currently two types of site generators: one for the presentation, and one for the content model.

Presentation generator

The presentation site is the one shared with end users. It contains neatly laid out information about the contents, thumbnails, an index page with recently added works and collections, a simple search engine (also running entirely out of static files), and delivery files attached or displayed inline.

To generate the site, simply run

pkar gen-site

The program output will indicate where the site has been generated. This location can be defined in the configuration file (app.lua).

Configuration options available are:

  • pres_gen.title: Title of the site appearing on the header and titles.
  • pres_gen.out_dir: output directory for the generated files.
  • pres_gen.max_homepage_items: maximum number of items in the "Recent collections" and "Recent works" sections of the home page.

Content model documentation generator

This generator is meant to provide a convenient reference for catalogers. Since reading the configuration files to find out which content types are available, which properties are allowed for each content type, and which constraints are set on each property is quite complicated, this utility provides a plainly laid out HTML document listing all this information. This page can be generated (and updated on content model updates), and distributed to catalogers, in the same ways described for the presentation site, but it is even more portable because it is a self-standing single file.

The command

pkar gen-cmdoc [path]

Generates the documentation page in the directory specified by the path argument, or to standard output if no path is specified.

Presentation file transformers

Many files submitted to the archives are not suitable for viewing in a browser. Some may be too large, some may be in a format that is good for archival (e.g., TIFF images or WAV audio) but not supported by most web browsers. For this reason, the presentation generation includes a step to transform production master files into presentation files.

Transformers can be adjusted to a presentation site's needs via configuration. The configuration file controlling this behavior is model/generation.lua, under the defined configuration folder.

The details of this configuration will be added to the Configuration guide. This guide and most of the functionality are under construction, and only two options are currently available: one for copying the PRODUCTION MASTER UNCHANGED, AND THE OTHER TO RESIZE IMAGES FOR THUMBNAIL USE and for full presentation size.

Multiple archives, multiple sites

Multiple archives may be managed under the same host. These archives would be entirely isolated from one another: they would accept separate submissions and generate independent presentations. This can be attained by keeping multiple Pocket Archive configurations, each defining its own locations for archive data, drop box, and presentation content output. E.g.,

PKAR_CONFIG_DIR=/opt/pkar/arch1/config pkar gen-site
PKAR_CONFIG_DIR=/opt/pkar/arch2/config pkar gen-site

Could generate two separate sites for archives configured in different locations.

Similarly, running multiple instances of a low-footprint web server, or configuring multiple virtual hosts, each pointing to a different content directory and listening to a different address or port, would be relatively straightforward, as long as the number of maintained archives is manageable. When using a more complex HTTP server such as Nginx, each site can even be password-protected independently. This guide will not go into the details of these setups as there are much better generic guides on web server setup around.