Danish web resource archive

Static web resources (for instance pdf-documents) as opposed to dynamic web resources (for instance web databases) are currently included in the Danish National Bibliography and DBC is responsible for cataloging these resources for the NB. However, the efforts put into cataloging such resources loose value as the resources themselves move or disappear on the internet and the links from the metadata to the resources are broken.

A poor and resource consuming remedy is an automatically performed check for broken URLs followed by a manual search for and update with the correct new URL. Another way of dealing with this problem is to preserve a copy of the resource, that has been cataloged, in an archive.

This is in fact what we have started doing at DBC. Following legal deposit legislation, the entire .dk domain is harvested and archived by the legal deposit institutions. However, this archive is inaccessible to the public due to privacy issues. Thus, we have not been able to just link to the legal deposit archive. Instead, we have introduced an archive of our own. The archive differs from the legal deposit archive in several ways:

  • it only reflects resources that have been cataloged by DBC
  • it can only be accessed from the metadata records (no direct search access to the archive)
  • due to immaterial rights issues it only contains freely accessible resources

We have had to observe immaterial rights issues as well as privacy issues. The former by consulting the proper authorities, the latter by asking the publishers for permission to archive their resources.

For practical reasons, we have started off by archiving pdf-files, since this is much easier than archiving HTML-pages with internal and external links. We have enhanced our cataloging tool, so when cataloging a specific pdf-file, the cataloger can check if we have a permit from the publisher and can initiate archiving directly in the cataloging proces. At the moment, we have permits from approx. 155 publishers.

We started archiving web resources as part of the cataloging process in the middle of 2009 and since then we have archived 3.400 resources, books as well as articles – mostly published by various authorities. We discontinued URL-checks of archived resources.

By the end of the year, library.dk (the union catalog for Danish citizens) will offer links to the live version of the resources as well as the archived version. This way, users can access the resources that have been cataloged, even if the resources are no longer found on the internet.

Carsten H. Andersen, Director of Bibliographic Division, DBC