Downloading from the ORA repository

ORA is an Open Access platform, and we are committed to making as much of our content available to as many users as possible. We are happy to work with those who want to download all or a large sample of the ORA website. However, we do ask you to get in contact, so we can advise the best way to do this without causing any problems to the ORA website. Unexpected 'spidering' or 'webscraping' of our content may lead to us blocking your access to the service.

Please note that ORA is a repository platform and not a publisher. Although our content is free to download, we are not the copyright owners. By using this service you are agreeing to the ORA Terms of Use. Individual records and binary files may also have their own licences or terms that describe how that content can be reused.

We do make a version of our metadata available via CC0 licence (see RIOXX terms, below).

OAI-PMH: a guide for harvesters

ORA supports and participates in the Open Archives Initiative (OAI). ORA is a registered OAI-PMH data-provider and provides metadata for all public records which is updated as soon as each record is published or updated.

Base URL

The OAI-PMH endpoint uses OAI_PMH v2.0 and is available at the base URL https://ora.ox.ac.uk/oai2

Item = ORA record

Each record in ORA is modeled as an Item in the OAI-PMH interface. Only the most recent version of each record is exposed via this interface.

Datestamps

Every OAI-PMH metadata record has a datestamp associated with it, which is the last modification time of that record in the ORA public website.

Because the current ORA public website dates from April 2018, the OAI-PMH datestamp values do not correspond with the original submission or publication times for older records, and may not for newer records because of administrative and bibliographic updates.

The earliest datestamp is given by the <earliestDatestamp> element of the Identify response.

The OAI-PMH interface does not support selective harvesting based on publication date. The datestamps are designed to support incremental harvesting of updates on an ongoing basis. It is not possible to selectively harvest only, say, records published in February 2017.

Except for selective harvesting based on subject areas (see description of Sets below) the interface is designed to support copying and synchronization of a complete set of ORA metadata. In order to harvest metadata for all articles, either make requests without a datestamp range (recommended), or make requests from the <earliestDatestamp> through to the present (but be aware that because of bulk updates there are some dates on which there were large numbers of updates).

Once an initial harvest has been completed, the copy may be maintained by making incremental harvesting requests with the from date set to the date of last harvest (from is best taken from the last server response; don't set the until date).

Sets

ORA records are available for selective harvesting as a separate set based on their 'Type of work' within the ORA system, e.g. 'thesis', 'dataset', 'journal article'. You may request a list of all the sets supported with the ListSets verb.

https://ora.ox.ac.uk/oai2/?verb=ListSets

Update schedule

New records are made available immediately on publication.

Record deletion policy

The ORA OAI-PMH service does not maintain information about deletions. Once deleted from the ORA system, deleted records are removed from the OAI-PMH service immediately.

Service availability

If required, ORA performs scheduled maintenance activity on Tuesday mornings from 07:00 to 09:00 (UK time). This may result in the OAI-PMH service being unavailable for short periods.

Identifiers

Internal ORA identifiers (record identifiers) are in the form uuid:12345678-1234-1234-12345678abcd.

ORA OAI-PMH identifiers are in the format oai_scheme:repository_identifier:record_identifier, e.g. oai:ora.ox.ac.uk:uuid:000d2073-9081-4a5b-b238-021cc7178e49. This is a change from the previous ORA OAI-PMH endpoint, where identifiers did not have the OAI scheme or Repository Identifier prefixes.

Harvesters which used the previous endpoint can map identifiers by prefixing them with ora:ora.ox.ac.uk:.

Metadata formats and downstream targets

Metadata for each item (record) is available in several formats. Not all formats are supported for all records.

You may request a list of all the metadata formats supported with the ListMetadataFormats verb.

https://ora.ox.ac.uk/oai2/?verb=ListMetadataFormats

Format	Metadata Prefix	Restriction	Description
OAI DC	oai_dc		OAI-PMH standard Dublin Core (DC).
Datacite	datacite_dc	Datasets only	Metadata format for the Datacite 4.6 metadata schema.
SOLO	solo_dc		Customised DC format for the Oxford University SOLO service
BASE	base_dc		Customised DC format for the BASE service.
OpenAIRE	oai_openaire		Customised metadata format for the OpenAIRE Literature Repository Guidelines v4.0.
EThOS	uketd_dc	Theses only	Extended DC format for the EThOS service.
RIOXX Terms	rioxx_terms		Metadata format for the RIOXX V2 Metadata application profile. This format has additions for deposit and record publication dates in support of UKRI and CORE recommendations. These updates use the RIOXX V3 Beta formats for these fields
RIOXX Terms CC0	rioxx_terms_cc0		The rioxx_terms_cc0 metadata format is released under a CC-0 licence. It is identical to the rioxx_terms metadata format, with the exception of abstracts/summary descriptions, which are not included.
RIOXX Terms Version 3	rioxx_terms_v3		Metadata format for the RIOXX V3 Metadata application profile
Unpaywall DC	unpaywall_dc		Customised DC format for Unpaywall to support file version information