API¶
The Sickle Client¶
-
class
sickle.app.
Sickle
(endpoint, http_method='GET', protocol_version='2.0', iterator=<class 'sickle.iterator.OAIItemIterator'>, max_retries=0, retry_status_codes=None, default_retry_after=60, class_mapping=None, encoding=None, **request_args)¶ Client for harvesting OAI interfaces.
Use it like this:
>>> sickle = Sickle('http://elis.da.ulcc.ac.uk/cgi/oai2') >>> records = sickle.ListRecords(metadataPrefix='oai_dc') >>> records.next() <Record oai:eprints.rclis.org:3780>
Parameters: - endpoint (str) – The endpoint of the OAI interface.
- http_method (str) – Method used for requests (GET or POST, default: GET).
- protocol_version (str) – The OAI protocol version.
- iterator – The type of the returned iterator
(default:
sickle.iterator.OAIItemIterator
) - max_retries (int) – Number of retry attempts if an HTTP request fails (default: 0 = request only once). Sickle will use the value from the retry-after header (if present) and will wait the specified number of seconds between retries.
- retry_status_codes (iterable) – HTTP status codes to retry (default will only retry on 503)
- default_retry_after (int) – default number of seconds to wait between retries in case no retry-after header is found on the response (defaults to 60 seconds)
- class_mapping (dict) – A dictionary that maps OAI verbs to classes representing
OAI items. If not provided,
sickle.app.DEFAULT_CLASS_MAPPING
will be used. - encoding (str) – Can be used to override the encoding used when decoding the server response. If not specified, requests will use the encoding returned by the server in the content-type header. However, if the charset information is missing, requests will fallback to ‘ISO-8859-1’.
- request_args – Arguments to be passed to requests when issuing HTTP requests. Useful examples are auth=(‘username’, ‘password’) for basic auth-protected endpoints or timeout=<int>. See the documentation of requests for all available parameters.
-
last_response
¶ Contains the last response that has been received.
-
GetRecord
(**kwargs)¶ Issue a ListSets request.
-
Identify
()¶ Issue an Identify request.
Return type: sickle.models.Identify
-
ListIdentifiers
(ignore_deleted=False, **kwargs)¶ Issue a ListIdentifiers request.
Parameters: ignore_deleted – If set to True
, the resulting iterator will skip records flagged as deleted.Return type: sickle.iterator.BaseOAIIterator
-
ListMetadataFormats
(**kwargs)¶ Issue a ListMetadataFormats request.
Return type: sickle.iterator.BaseOAIIterator
-
ListRecords
(ignore_deleted=False, **kwargs)¶ Issue a ListRecords request.
Parameters: ignore_deleted – If set to True
, the resulting iterator will skip records flagged as deleted.Return type: sickle.iterator.BaseOAIIterator
-
ListSets
(**kwargs)¶ Issue a ListSets request.
Return type: sickle.iterator.BaseOAIIterator
-
harvest
(**kwargs)¶ Make HTTP requests to the OAI server.
Parameters: kwargs – OAI HTTP parameters. Return type: sickle.OAIResponse
Working with OAI Responses¶
-
class
sickle.response.
OAIResponse
(http_response, params)¶ A response from an OAI server.
Provides access to the returned data on different abstraction levels.
Parameters: - http_response – The original HTTP response.
- params (dict) – The OAI parameters for the request.
-
raw
¶ The server’s response as unicode.
-
xml
¶ The server’s response as parsed XML.
Iterating over OAI Items¶
-
class
sickle.iterator.
OAIItemIterator
(sickle, params, ignore_deleted=False)¶ Iterator over OAI records/identifiers/sets transparently aggregated via OAI-PMH.
Can be used to conveniently iterate through the records of a repository.
Parameters: - sickle (
sickle.app.Sickle
) – The Sickle object that issued the first request. - params (dict) – The OAI arguments.
- ignore_deleted (bool) – Flag for whether to ignore deleted records.
-
sickle
¶ The
sickle.app.Sickle
instance used for making requests to the server.
-
verb
¶ The OAI verb used for making requests to the server.
-
element
¶ The name of the OAI item to iterate on (
record
,header
,set
ormetadataFormat
).
-
resumption_token
¶ The content of the XML element
resumptionToken
from the last request.
-
ignore_deleted
¶ Flag for whether to skip records marked as deleted.
-
next
()¶ Return the next record/header/set.
- sickle (
Iterating over OAI Responses¶
Classes for OAI Items¶
The following classes represent OAI-specific items like records, headers, and sets.
All items feature the attributes raw
and xml
which contain their
original XML representation as unicode and as parsed XML objects.
Note
Sickle’s automatic mapping of XML to OAI objects only works for Dublin Core encoded record data.
Identify Object¶
The Identify object is generated from Identify responses and is returned by
sickle.app.Sickle.Identify()
. It contains general information about
the repository.
-
class
sickle.models.
Identify
(identify_response)¶ Represents an Identify container.
This object differs from the other entities in that is has to be created from a
sickle.response.OAIResponse
instead of an XML element.Parameters: identify_response ( sickle.OAIResponse
) – The response for an Identify request.Note
As the attributes of this class are auto-generated from the Identify XML elements, some of them may be missing for specific OAI interfaces.
-
adminEmail
¶ The content of the element
adminEmail
. Normally the repository’s administrative contact.
-
baseURL
¶ The content of the element
baseURL
, which is the URL of the repository’s OAI endpoint.
-
respositoryName
¶ The content of the element
repositoryName
, which contains the name of the repository.
-
deletedRecord
¶ The content of the element
deletedRecord
, which indicates whether and how the repository keeps track of deleted records.
-
delimiter
¶ The content of the element
delimiter
.
-
description
¶ The content of the element
description
, which contains a description of the repository.
-
earliestDatestamp
¶ The content of the element
earliestDatestamp
, which indicates the datestamp of the oldest record in the repository.
-
granularity
¶ The content of the element
granularity
, which indicates the granularity of the used dates.
-
oai_identifier
¶ The content of the element
oai-identifier
.Note
oai-identifier
is not a valid name in Python.
-
protocolVersion
¶ The content of the element
protocolVersion
, which indicates the version of the OAI protocol implemented by the repository.
-
repositoryIdentifier
¶ The content of the element
repositoryIdentifier
.
-
sampleIdentifier
¶ The content of the element
sampleIdentifier
, which usually contains an example of an identifier used by this repository.
-
scheme
¶ The content of the element
scheme
.
-
raw
¶ The original XML as unicode.
-
Record Object¶
Record objects represent single OAI records.
-
class
sickle.models.
Record
(record_element, strip_ns=True)¶ Represents an OAI record.
Parameters: - record_element (
lxml.etree._Element
) – The XML element ‘record’. - strip_ns – Flag for whether to remove the namespaces from the element names.
-
header
¶ Contains the record header represented as a
sickle.models.Header
object.
-
deleted
¶ A boolean flag that indicates whether this record is deleted.
-
raw
¶ The original XML as unicode.
- record_element (
Header Object¶
Header objects represent OAI headers.
Set Object¶
MetadataFormat Object¶
-
class
sickle.models.
MetadataFormat
(mdf_element)¶ Represents an OAI MetadataFormat.
Parameters: mdf_element ( lxml.etree._Element
) – The XML element ‘metadataFormat’.-
metadataPrefix
¶ The prefix used to identify this format.
-
metadataNamespace
¶ The namespace URL for this format.
-
schema
¶ The URL to the schema file of this format.
-
raw
¶ The original XML as unicode.
-