Accessibility and visibility: considering the needs and contexts of repositories in South Africa

Global momentum for open access and open science is putting pressure on researchers and institutions to make all their outputs, including research data, as accessible and visible as possible.
Accessibility and visibility: considering the needs and contexts of repositories in South Africa

A digital repository is a software system that manages and provides access to digital objects. In a university setting these digital objects include publications, theses and dissertations, data — from field-specific to general — or special collections of research materials that have been digitised for preservation and broader access. These all have different needs, end-users and requirements. Repository options can range from highly complex and feature-laden software systems — that integrate with a range of open science systems for maximum visibility — to systems specifically designed for low-resource settings where Internet access is not assumed and no technical skills are required by the repository managers.

“While many South African institutions have their own repositories we see a wide variation within that group,” says Wesley Barry, ORCID support specialist at TENET. “Some of the larger universities have highly sophisticated repositories, including data repositories with accompanying support teams, while others are struggling with outdated systems because they don’t have the support for technical upgrades.”

However, the global momentum building for open access and open science is putting pressure on researchers and institutions to make all their outputs, including research data, as accessible and visible as possible.

The push to publish research data

For Dr Benito Trollip, a researcher at the South African Center for Digital Language Resources (SADiLaR) and an open data advocate, the first step is to help South African researchers, particularly researchers in the humanities, understand the value of their data for research other than their own.

“Researchers can spend months or even years gathering and refining a data set,” explains Trollip. “They may use this for their own research and publications, but not see the value this data may have for a researcher in another field. For instance a political scientist working on a dataset of speeches by African presidents over three decades may not realise the value such data could have for a linguist.”

He argues that making this data available opens up possibilities for wider research, and, considering how much research is paid for by public funds, there is a moral argument to be made that all outputs of the work, publication and data, should be publicly accessible.

“Not everybody has the funds to, for instance, go into the field and conduct interviews with a particular group of people, but making this anonymised data available could create interesting opportunities for other avenues of research.”

Trollip acknowledges that data can be sensitive, but rather than using this as an excuse to lock it away he points to data management strategies around anonymising data or making data available only for specific, pre-vetted, research needs.

While the public good argument for data publication does incentivise some researchers, funders and journals are increasingly mandating data publication, which is perhaps the stick to make the majority of researchers and research institutions start considering the options.

A challenge of resources

Effective repositories require significant resources, particularly in terms of skills and staff. As a first step, to get a repository up and running and ensure the software is properly maintained, particular technical skills are needed. Here, says Barry, comes the first challenge. Repositories are often owned by the university libraries, and that usually requires that the library have their own specialised IT department to manage it. Or failing that, a strong relationship with the university’s central IT to ensure the upgrading of these systems are prioritised.

“The other problem is a huge skills shortage in South Africa in this space,” says Barry. “Even if you do train up staff to effectively manage the technical side of the repository, we see such a high level of mobility of university staff in South Africa, especially highly-skilled staff, there is no guarantee they will stay.”

The next challenge is that a successful repository needs a skilled support staff to assist researchers upload their data in a way that is FAIR (findable, accessible, interoperable and reusable). This involves a number of considerations, including ensuring effective metadata (data about the data) and that datasets should be published in formats that are machine-readable and non-proprietary.

This takes a special set of expertise and skills. Even more challenging is that this data management is also field specific, with different requirements in different fields.

But there are also challenges around resources of the end-user. In his paper: Reflections on design principles of a digital repository in a low-resource environment, Professor Hussein Suleman, head of the University of Cape Town’s Department of Computer Science, notes the low levels of reliable access to the Internet in Africa as compared with Europe and America, and that most current repository tool kits only work over the Internet.

These are factors that should be borne in mind when investigating repository options.

Trollip notes that an effective repository requires ongoing management and advocacy. “This is not a ‘build it and they will come’ situation,” he says.

Considering the requirements and architecture of a repository

TENET, as the lead organisation in South Africa’s ORCID consortium recommends that at the very least, the repository software integrates with ORCID.

The ORCID iD is a free, unique and persistent identifier, which stays the same irrespective of name or institution changes, and links to a public profile where a researcher’s record, including affiliations, research outputs, social media and more, can be accessed. ORCID integrates across a wide range of research and academic institutions, and is used by funders, publishers and institutions to maintain records. For both researchers and their institutions, the value of ORCID iD is also in increased visibility.

Visibility is also critical for a successful repository, says Trollip. “Just because something is online does not mean the right people will find it. The managers of the repository need to use all the tools at their disposal to make the contents of their repository, be it data or publications or just a specific collection, as findable as possible.”

For this same reason, Barry recommends that institutions look for repositories that integrate with the various tools and platforms available, as well as ORCID, these include DataCite and CrossRef among others.

“Ideally you want a software that can pull information from other systems and integrate with common global platforms to reduce the need for time-consuming manual entries but also for greater visibility.”

However, in a low-resource setting like South Africa other factors need to be considered when choosing a repository.

“The technical architecture of current repository toolkits is not ideal for low-resource environments,” argues Suleman in his paper. “And an alternative architecture specifically designed for low-resource environments may be a more effective solution, especially for the digital humanities and heritage domain where collections are not funded in perpetuity and technical skills and other resources are limited.”

In his paper he discusses a minimalistic approach to repository architecture with a reduction on dependencies, particularly network and data storage. He describes a number of design decisions that were factored in the design of the Five Hundred Year Archive, a collection of digital objects and associated metadata that are related to pre-colonial history in South Africa.

Read the full paper for details of the design principles which went into building a repository for low-resource environments.

Ensuring your repository is fit for context

“There is no one-size-fits-all solution when it comes to repositories,” says Barry. “The key is to have in mind, before a decision is made, who your users are, and what their needs are.”

The right option will usually be context-specific. The open access movement took off in developed countries where institutions have access to a great many more resources, following their example is unlikely to work as effectively in Africa. As with so many things, the most effective solutions will be context specific.