MOD-OAI-PMH documentation

Module corresponded for metadata harvesting from the FOLIO platform.

It supports 6 verbs for independent purposes but some of them seem similar.

Supported verbs

  • Identify
    Returns general information about the repository 
    URI {{host}}:{{port}}/oai/records?verb=Identify

    Response example


  • MetadataFormats 
    Returns the list of supported metadata formats for records response.
    Has an optional parameter identifier, which is a pointer to a specific record. By providing this parameter, a list of supported formats for this specific record will be returned. 
    If a record with such id isn't found in mod-source-record-storage then the empty ListMetadataFormats response will be returned.
    URI {{host}}:{{port}}/oai/records?verb=ListMetadataFormats

    Response example 


    Records response examples depending on metadata format specified

    The difference between marc21 and marc21_withholdings is that the second one contains additional 856 and 952 data fields with the holdings and items data
    The metadata prefix oai_dc doesn't support appending the holdings and items data as marc21.

  • ListRecords - in some sense the main verb. 

    Returns the records collection, directly the FOLIO records data (SRS records and inventory data in case of marc21_withholdings metadata prefix)
    URI {{host}}:{{port}}/oai/records?metadataPrefix=marc21&verb=ListRecords

    Mandatory params metadataPrefix or resumptionToken. 
    Optional params - from, until, set (can be provided only once within the first request, for details read the section below)

    Initial request and resumption token

    The first request(initial request) that is sent to oai-pmh is like a configuration request that defines filtration parameters and the view for records that are going to be harvested.
    Within the first request, the metadata prefix and optional parameters are specified. The first response will contain the batch of matched records and resumption token(RT) if there is more than one batch of records available to be harvested.
    Then each consequent request starting from 2..n requires only the RT from the previous request-response (no need to specify any other parameters, the request will be rejected as invalid). RT brings the specified configuration parameters from the initial request and pointers to the next batch of records. 
    The size of the batch is set by the "max records per response" setting (you can find more about oai-pmh settings here)

          Deleted records support

          Deleted records support setting defines whether it is required to process the instances marked as deleted.
          Suppressed records processing setting defines whether it is required to process the instances marked as suppressed from discovery.
          About the default values for the above configurations and related behavior read here.


ListRecords marc21
As a header identifier, the instance id is used.
Returns the plain marc records/instances depending on metadataSource from mod-source-record-storage with record type MARC that are converted to XML representation.

     

metadataSource = Inventory

OAI-PMH makes call to the inventory (/instance-storage/instances?query=...) by using query for each batch with size=repository.maxRecordsPerResponse. This query includes a limit as well as repository.suppressedRecordsProcessing and repository.deletedRecords (taking from configuration). These parameters relate with each other according:

    var discoverySuppress = nonNull(deletedRecordsSupport ? null : suppressedRecordsSupport);

After that instances should be transformed to the corresponding XML format (marc21/dc).

metadataSource = SRS

OAI-PMH makes call to the SRS-storage (/source-storage/source-records?query=...) with using SRS-client which supports deletedRecordsSupport and suppressedRecordsSupport options. Both SRS and Inventory clients supports suppressedFromDiscovery and supportDeleted option. After that based on the value of withInventory. If it is true then OAI-PMH makes call to the inventory (/instance-storage/instances?query=...) to retrieve corresponding instances. If source record is not found but instances are presented then instances should be converted to marc21 or dc format and returned to the client. Otherwise, if instances and source records are presented, then they are combined and returned in XML representation (marc21 or dc).

If withInventory=false then the only records from SRS should be downloaded and transformed to the corresponding XML format marc21/dc.

ListRecords marc21_withholdings
As a header identifier, the instance id is used.
Returns the plain marc records from mod-source-record-storage with record type MARC that are enriched with inventory data (?) and converted to XML representation.

Marc21_withholdings business flow




The processing of marc21_withholdings request is divided into two processes - download instances ids and process records.

Download instance ids is a process that asks for instance ids that meet the filtering criteria from inventory storage and saves them to the local module database with a small set of the required metadata.
The filtering criteria are provided by the parameters specified within the initial request and oai-pmh settings like "deleted records support" and "suppressed records processing".



The initial request triggers the downloading instances ids, this is the process that is run in the background and populates the database(/inventory-hierarchy/updated-instance-ids). This process is run only one time for each separate harvest. Each harvest has its own request id that can be found in the request_metadata_lb table(you can find the request id of your harvest within the decoded resumption token). It has a row with request id related to harvest and state of the downloading instances ids process (whether it is in progress or already completed). 
The "process records" process is triggered by each request and it asks the database for the already downloaded instance ids cyclically. So, while not enough instances were downloaded it will ask the database each 500ms for the data until the downloaded instances process will populate the DB with the required amount of instances ids required for processing the batch of data.

Process records process is pretty simple and can be described in the next few steps:
1) Get instances ids from DB in size of "max records per response" setting value
2) Request marc records from mod-source-record-storage(SRS) by instance ids got from step 1.
    If SRS cannot send data for some reason the oai-pmh logic will ask it cyclically until data will be received or request attempts will expire(50 attempts in total)
3) Filter instances for which marc records weren't found (therefore instances with source FOLIO are omitted)
4) Request the holdings/items data from inventory(/inventory-hierarchy/items-and-holdings)
5) Populate SRS records with items/holdings data (852 and 956 fields)
6) Convert enriched marc records to XML representation
7) Send the response to a client

  • ListIdentifiers

    Very similar to the ListRecords with only one difference - it returns the only header of records.
    The other behavior is the same as the ListRecords verb. Supports only marc21 and oai_dc metadata formats.
    As a header identifier, the marc record id is used.
    URI {{host}}:{{port}}/oai/records?metadataPrefix=marc21&verb=ListIdentifiers
    Mandatory params: metadataPrefix or resumptionToken
    Optional params - from, until, set.

    Response example 



  • GetRecord 
    Similar to ListRecords but returns only one record metadata by specified identifier and with a view of specified metadata format(supports all 3 metadata formats)
    Doesn't support the resumption token since is not a list request.
    URI {{host}}:{{port}} /oai/records?verb=GetRecord&identifier=oai:folio.org:diku%2F{{marc_record_id}}&metadataPrefix=marc21

    Important note the '/' sign shouldn't be specified within the record identifier since it will confuse API. Such signs should be passed in an encoded way, the URL above demonstrates it, the %2F is an encoded '/' sign.

    Mandatory params: identifier, metadataPrefix


    Response example




  • ListSets

    Returns set collection
    Sets are used to perform records filtration but now the implementation is like a stub and will be enhanced.

    URI {{host}}:{{port}} /oai/records?verb=ListSets

    Response example