RTAC (RTAC | Holdings) performance improvement

This page proposes several options of RTAC performance improvement, which relates to MODRTAC-38.

Background (how RTAC logic works without alteration)

  1. Go to /inventory/instances/<instance_id> to verify such instance exists
  2. Go to /holdings-storage/holdings and retrieve all holdings where instanceId matches requested Id
  3. For each holdings record go to /inventory/items and retrieve all items where holdingsRecordId matches holding Id.
  4. For each item go to /circulation/loans and retrieve all loans where itemId matches item Id and status equals “Open”

See getRtacById method for details.

After running performance testing on RTAC (see details in report) it was discovered that the time it performs isn’t appropriate, especially as for EDS UI. Therefore the time of processing RTAC request should be reduced.

O-complexity for REST requests (all of them are supposed to go through OKAPI and other ): 

1. Reduce calls by using CQL "or" statement

The first option is based on CQL “or” statement usage. The idea is to retrieve more items and loans with a minimum number of calls instead of making a request for each holdings’ and item’s Ids accordingly in a loop. So, the initial algorithm might transform as follows:

  1. Go to /inventory/instances/<instance_id> to verify such instance exists
  2. Go to /holdings-storage/holdings and retrieve all holdings where instanceId matches requested Id
  3. Gather all holdings Ids into CQL query separated by “OR” statement and retrieve all items where holdingsRecordId matches provided holdings Ids.
  4. Gather all item Ids into CQL query separated by “OR” statement and retrieve all loans where itemId matches provided item Ids and status equals “Open”.

Pros

Cons

Minimal effort from the implementation side

Improves performance less than database view

Doesn’t require other modules modification

CQL query is limited by approximately 50 UUIDs per time (could be neutralized by creating POST endpoint in other modules, but it burdens other modules implementors)

2. Create database view which supports API for RTAC in mod-inventory-storage

Significant performance improvement could be achieved by creating a specific database view/function in mod-inventory-storage module. This view is supposed to take instance UUID as input parameter and return all necessary for RTAC response information. All filtering is done on SQL level, which proves to show great performance. Just one call to mod-inventory-storage API would be done. The algorithm might be next:

  1. GET /inventory/rtac-view?instanceId=<instance_id>
  2. From full representation Retrieve all items which are associated with requested instance Id.
  3. Check on items’ loans via REST calls

Pros

Cons

Significant performance improvement due database joins instead of REST calls

Special data representation and corresponding endpoint in inventory-storage, which is needed (for now) only for mod_rtac

Only necessary data (fields) can be retrieved from mod-inventory-storage

Loans are in different schema (diku_mod_circulation_storage), so 4th point should still be executed.

Note: If they are in the same database, but different schemes special user can be created and GRANTS for schema and table can be granted him

3. Use of GraphQL

This approach is based on using GraphQL for fetching items information. GraphQL is basically a query language for API and it was implemented for FOLIO needs in mod-graphql module. The good thing about it is that one could specify CQL query in it, but without length limitations, as it goes to POST body, and build just the same set of fields which is needed. The next query could be utilized:

query ($requestedInstanceId:String) {
  instance_storage_instances(instanceId: $requestedInstanceId) {
    holdingsRecords2 {
      holdingsItems {
        id 
        effectiveLocationId
        effectiveCallNumberComponents {
          prefix
          callNumber
          suffix
        }
        status {
          name
        }
        permanentLoanTypeId
        temporaryLoanTypeId
        volume
        enumeration
        chronology
      }  
    }
  }
}

The high-level algorithm is presented below: 

  1. Go to /graphql by instance id
  2. Retrieve all items information which is associated with requested instance Id.
  3. Check on items’ loans via REST calls

Pros

Cons

Correct implementation allows to efficiently join entitiesCurrently in POC stage

Current implementation implies REST calls to inventory-storage for every entity


Performance for current implementation is worse then CQL with OR

4. Create API for getting items by instance ids

Provide additional API interfaces to navigate the inventory data model more efficiently, such as: 

GET /inventory-storage/itemsByInstanceID?instanceId=<instance_id>
  • New API interface should return item information enriched with actual properties names where item object stores only property UUID (e.g. "id": "fcd64ce1-6995-48f0-840e-89ffa2288371""name": "Main Library" }). That is, result response should be similar to /inventory/items response. 
  • It also should be able to take multiple instance ids (up to 50) in order to cover requirements from  MODRTAC-34 - Getting issue details... STATUS .
  • Furthermore, an additional parameter (e.g. includeSuppressedFromDiscovery = {true, false}) should be introduced for successful  MODRTAC-35 - Getting issue details... STATUS  implementation.

This would benefit other situations, beyond just RTAC, where the data model hierarchy needs to be traversed - especially if domain boundaries are not crossed.

This would also reduce the problem to the following:
instanceID → (instance → holdings → items) → loans
where the traversal in parentheses is implemented at the database level and surfaced as a single API call.

This leaves potential 2 API calls (assuming interfaces which accept multiple UUIDs) and avoids crossing domain boundaries.

5. Transform OAI-PMH view API in Inventory into more general API

OAI-PMH view API was created earlier in mod-inventory-storage order to make Inventory/OAI-PMH collaboration performance as efficient as possible. Now it consists of 3 endpoints:

  • /oai-pmh-view/instances - obsolete one, left for backwards compatibility, to be removed.

  • /oai-pmh-view/updatedInstanceIds - takes datetime period for filtering and couple configuration parameters (deletedRecordSupport and skipSuppressedFromDiscoveryRecords) and returns the list of instance UUIDs which were updated within requested period of datetime. 

  • /oai-pmh-view/enrichedInstances - takes the list of instance UUIDs and returns the holdings and items information (with related vocabulary values, e.g. location and material types) for requested instances.

For RTAC purposes /oai-pmh-view/enrichedInstances endpoint could be

  • re-named to fit more general context
  • expanded (return more necessary items/holdings fields)
  • augmented with a new parameter which defines what information to be returned: holdings-level, item-level, both levels.  

All benefits and outcomes from option #4 are saved and furthermore

  • It requires less workload for implementation;
  • Context-specific API is turned to a general one;
  • New API isn't required