Monthly Archives: March 2022

Would you rather use LOCKSS, CLOCKSS or Portico? Three approaches to long-term preservation of scholarship

By Katharina Markus

Long-term access to scientific publications as a challenge

Scientific knowledge and published results are the backbone of new research. But on the Internet digital information is dynamic and short-lived. In terms of science, this may mean a relevant paper is not available anymore and its links produce error messages. Not only this knowledge is not publicly available any longer for the scientific discourse, but a publication referencing the missing paper also becomes less legitimate since its references are not verifiable. The three concepts this blog article will introduce do aim at providing reliable access to scientific publications.

By now, a large amount of scientific publications are only available digitally. If they are hosted exclusively on the publisher’s server and website, they may vanish as soon as the publisher goes out of business, as the journal is not making enough profit and is discontinued or as technical issues, maintenance or development become too cost intensive. On the other hand, preventing this situation by establishing one large-scale central preservation set-up faces the challenges of needing every publisher’s permission due to economic concerns and rights. Specifically, rights to closed access content often lie exclusively with the publishers and libraries license only access to it.

The licensing model also leads to another effect, separate from content vanishing from the Internet. If a library discontinues a subscription to an electronic journal, it not only cancels access to new journal issues. The library often also loses access to back issues, which it had been providing until that point. Since these back issues might still be relevant to the user community of the library, Post-Cancellation Access (PCA, also Perpetual Access) was formulated as a second aim in the context of preservation.

These concerns have been addressed by the library community in order to prevent the worst case – losing publications for good. Since no single institution has the capacity to ensure preservation of all publications due to the large amount of material, as well as the diversity of sources and rights, several solutions have been established. Diversification of preservation efforts also serves as an additional back-up strategy. Among these solutions are three internationally active initiatives: the Global LOCKSS Network (GLN) [1], CLOCKSS [2] and Portico [3]. The Keepers Registry is a registry of preserved journals, to which these three initiatives provide information about preserved content [4].

Due to the complexity of the topic, this text will concentrate on closed access content, specifically journals. Open access-specific aspects are included when serving as contrasting cases to closed access preservation approaches.

1.      LOCKSS

The software Lots Of Copies Keep Stuff Safe (LOCKSS) is deployed globally and is used in various projects and networks [5]. The abbreviation LOCKSS can serve as a short-hand for various concepts and initiatives: the LOCKSS software, the LOCKSS Program, the LOCKSS Alliance and the Global LOCKSS Network (GLN). In this text the expression LOCKSS refers only to the software.

The software [6] is open source, developed by the community and principally maintained by the LOCKSS Program, which is situated at the Stanford Libraries [7]. An alliance of users, the LOCKSS Alliance, provides funding for further development of the software via their participation fee and receives in return services like technical support [8].

LOCKSS is used in a network of nodes with a LOCKSS box in each of them, where content is hosted locally at the participating institutions. The software also connects the boxes and uses frequently generated integrity values for comparison of the same objects at various locations i. e. in various boxes. Whereas one dissenting value amongst many for several copies of the same object implies corruption of the respective single copy, many different values for the same object imply a complex problem [9]. The transfer of content into the boxes is primarily designed as web-harvesting [9, 10]. The publisher provides a LOCKSS permission statement which the institution’s LOCKSS box can access via IP authentication [11] as part of the harvesting process. With regard to preservation actions, migration to new formats is generally designed to be part of the access workflow [12]. If the source of the content, the publisher’s website, is not responding anymore, access can be provided by the locally stored version with various access methods [13]. Often, access is provided by a workflow or process that is automatically triggered by the unresponsiveness of the source or by the information that the original source is not available any longer – as such, this event is called a “trigger event”. Since different LOCKSS networks exist, inter alia the Global LOCKSS Network (GLN), CLOCKSS and PKP Preservation Network, configuration, used plugins and definition of trigger events may differ from one to another.

Global LOCKSS Network

GLN is the original LOCKSS network and members of GLN historically also become part of the LOCKSS Alliance; therefore membership fees correlate [8]. GLN provides not only access in case of unresponsive publisher websites, but also Post Cancellation Access [14]. For PCA, restriction of access to the preserved content is necessary since the original is still available on the publisher’s website and the preserved copy would cause economic competition otherwise. The institutions or institution consortia sign contracts with publishers that allow publications to become part of GLN and integration into the local LOCKSS box. In case the content becomes unavailable, granted rights and local hosting ensure continued access for members of the institution that were allowed access before [15, 16]. Specifically localized hosting and a network with shared integrity control are distinctive features of GLNs and more broadly of LOCKSS. Institutions maintain control over the locally hosted content, irrespective of journal’s or preservation service’s fate. As members of GLN generally conclude contracts with publishers they also can decide which publications to add to the preservation network. Apart from this member-driven addition process, publishers can also request inclusion of their publications, although the decision stays with the GLN members [17].

2.     CLOCKSS

The preservation network CLOCKSS (Controlled LOCKSS) also uses LOCKSS software. It is a not-for-profit organization and financed by participating libraries as well as publishers [18]. The network is closed, as the self-explanatory name indicates, with a set number of twelve archive nodes at various academic institutions worldwide [19] and is therefore characterized as a “Private LOCKSS Network”. Differences between Private LOCKSS Networks and open-ended LOCKSS networks are generally just intent and governance of the networkssince these closed groups have a common goal and do not invite new members with the intention of setting up new nodes [20]. Data transfer into the archive and preservation actions correspond to the possibilities provided by the LOCKSS software [21]. CLOCKSS’ specific concept is archived publications becoming open access when a trigger event occurs [23]. Accordingly, reasons for libraries to participate in the CLOCKSS initiative are not related to access per se, but participants support CLOCKSS’ mission and gain a voice in respective decisions [22]. While the community is involved in decisions, contracts are concluded between CLOCKSS as a legal entity and publishers [18].

 

3.     Portico

Portico, as a third example of a preservation service uses yet another slightly different concept. It shows similarities to CLOCKSS, as it is also set up as a non-profit-organization, in this case as a service that is part of the education and research initiative ITHAKA (https://www.ithaka.org/), and is financed by publishers and libraries as well. On the other hand, access is provided not only in case of unavailable publisher hosted versions (trigger events [23]) but also as PCA [24] which is similar to the GLN model. The main difference with the above-mentioned strategies is that here there is a central institution, Portico, where content is hosted, with no distribution across nodes. While Portico offers PCA, it is only available for a subset of Portico content as the publisher must allow PCA. Additionally, in order to use the PCA-service the library must be a Portico participant and has to request PCA providing documentation about the formerly licensed content. This specific content then is only accessible to the requesting library. Access due to trigger events, on the other hand, is not tied to any licences of the participating libraries but to the Portico membership. Contracts are arranged between Portico and the publisher, same as the data transfer procedures. Preservation management includes monitoring for technology obsolescence [25] and migration as the main preservation action [26]. Access is provided to members of the participating institution based on IP addresses [23, 24]. Since Portico takes care of publisher negotiations, hosting, and preservation actions and provides PCA, participating libraries are relieved from many preservation efforts. On the other hand, access to closed access “triggered content” is only provided as long as institutions stay members of Portico. Open access content on the other hand is made freely available in case of trigger events.

 

*

All three services are not limited to journals or e-books and preserve also other types of content, and they pursue as well specific projects or collaborations that go beyond the described basic structure.

Keepers Registry

 

Finally, information about preserved journal issues is provided by various preserving institutions, among them GLN, CLOCKSS and Portico, to the Keepers Registry [27]. The registry, in turn, offers information about preserved issues and preserving institutions or initiatives at a central portal that is free of charge. This service is limited to journals (e-serials), whereas information about other content preserved in the above mentioned initiatives can be found on the website of the respective preservation service.

 

Preservation Services analysis and choices made by  ZB MED

ZB MED has its own preservation system and is in the process of preserving publications as part of its own publication services. Still it is also interested in using a preservation service to secure the preservation of the journals that are in its holdings. It conducted a Keepers Registry analysis with data about its own holdings, comparing it against data from the Keepers Registry. Selection for journals from its holding covered by the above-mentioned preservation services showed large coverage by Portico. While a small section of the holding, at least at the time of the analysis, is not preserved by Portico, the percentage of preserved journals was deemed sufficiently high to make a Portico membership beneficial. Additional benefits of Portico are PCA, access to all triggered content in Portico, as well as leaving the effort of contract negotiations as well as object processing to the service. The membership leaves ZB MED free to concentrate its resources on preserving journals not covered by Portico or CLOCKSS.

*

In conclusion, the three initiatives have different features that partially overlap, partially complement each other. Individual libraries and other institutions involved in preservation efforts can evaluate where they want to participate according to their own preferences and using information provided by the Keeper Registry.

 

Dr. Katharina Markus, Head of Digital Preservation

ZB MED – Information Centre for Life Sciences, Cologne, Germany

markus@zbmed.de

 

References

[1] https://www.lockss.org/join-lockss/networks/global-lockss-network, last accessed on 03/ 17/2022

[2] https://clockss.org/, last accessed on 03/ 17/2022

[3] https://www.portico.org/, last accessed on 03/ 17/2022

[4] https://keepers.issn.org/, last accessed on 03/ 17/2022

[5] https://www.lockss.org/join-lockss/networks, last accessed on 03/ 17/2022

[6] https://lockss.github.io/, last accessed on 03/ 17/2022

[7] https://www.lockss.org/about/program-and-people, last accessed on 03/ 17/2022

[8] https://www.lockss.org/join-lockss, last accessed on 03/ 17/2022

[9] https://www.lockss.org/about/frequently-asked-questions#technology, last accessed on 03/ 17/2022

[10] https://www.lockss.org/use-lockss/how-lockss-works, last accessed on 03/ 17/2022

[11] https://lockss.github.io/administrators/classic-lockss/basic-config/adding-aus.html, last accessed on 03/ 17/2022

[12] Rosenthal, D.S.H., Lipkis, T., Robertson, T.S. and Morabito, S. (2005), “Transparent format migration of preserved web content”, D-Lib Magazine, Vol. 11 No. 1, January, available at: www.dlib.org/dlib/january05/rosenthal/01rosenthal.html, last accessed on 03/ 17/2022

[13] https://www.lockss.org/use-lockss/how-lockss-works, last accessed on 03/ 17/2022

[14] https://www.lockss.org/use-lockss/post-cancellation-and-perpetual-access, last accessed on 03/ 17/2022

[15] https://keepers.issn.org/keepers#global-lockss-network, last accessed on 03/ 17/2022

[16] Seadle, M. (2010), “Archiving in the networked world: LOCKSS and national hosting”, Library Hi Tech, Vol. 28 No. 4, pp. 710-717. https://doi.org/10.1108/07378831011096321, last accessed on 03/ 17/2022

[17] https://www.lockss.org/use-lockss/publishers, last accessed on 03/ 17/2022

[18] https://clockss.org/join-clockss/, last accessed on 03/ 17/2022

[19] https://clockss.org/community/supporting-libraries/, last accessed on 03/ 17/2022; the CLOCKSS Archive “nodes” are marked with an asterisk

[20] https://lockss.github.io/administrators/admin/introduction, last accessed on 16. 03. 2022

[21] https://clockss.org/about/how-clockss-works/, last accessed on 03/ 17/2022

[22] https://clockss.org/faq/, last accessed on 03/ 17/2022

[23] https://www.portico.org/for-participants/#triggered, last accessed on 03/ 17/2022

[24] https://www.portico.org/for-participants/#pca, last accessed on 03/ 17/2022

[25] https://www.portico.org/our-work/preservation-approach/, last accessed on 03/ 17/2022

[26] https://www.portico.org/wp-content/uploads/2017/12/Portico-Format-Monitoring-Policies.pdf, last accessed on 03/ 17/2022

[27] https://keepers.issn.org/keepers, last accessed on 03/ 17/2022