Overview

This page is created with the purpose to define the technical approach to search and retrieve MARC authority records via MARC Authority app. The solution should be given considering the requirements listed below:

1) These results need to return via the UI.

Number of results returnedTime to retrieve all results and display results
1 million3 seconds
1 - 3 million 5 seconds
3 - 5 million

8 seconds

5 - 10 million10 seconds 
10 - 15 million12 seconds

2) Requirements for the HTTP response: 

3) The search needs to support such options:

Reference page: Search MARC Authorities


Solution

Authority records are getting imported only into SRS in the current implementation of data-import. The SRS uses Postgres database to store the records. Most of the requirements, given above, include full-text search, are not supported by Postgres so efficiently as expected.

The solution is to use a search engine, that is designed to index and search data more productive. We will use the ElasticSearch. It’s able to achieve fast search responses because instead of searching the text directly, it searches an index. It uses a structure based on documents instead of tables and schemas and comes with extensive REST APIs for storing and searching the data.

ElasticSearch is already used in FOLIO. It stores Instances, Holdings, and Items. The mod-search provides a REST API to search records in CQL format. The SRS will still store originally imported Authority records (MARC + Parsed) in Postgres, but the Search API will be provided by the mod-search. Here is documentation that explains supported search types and options: https://github.com/folio-org/mod-search/blob/master/README.md#supported-search-types

We can organize storing and searching Authority records similar to Instances/Holdings/Items, having added more steps to the Authorities import workflow.


Authorities import workflow

  1. MARC file is uploaded from UI to mod-data-import
  2. MARC records are packed into batches and put to Kafka queue (EVENT - DI_RAW_RECORDS_CHUNK_READ)
  3. mod-srm reads batches from the queue, validates and passes to mod-srs via Kafka queue (EVENT - DI_RAW_RECORDS_CHUNK_PARSED)
  4. mod-srs stores Authority records into PostgreSQL database and returns the result back via Kafka queue (EVENT - DI_PARSED_RECORDS_CHUNK_SAVED)
  5. mod-srm reads the profile and creates JSON payload (containing parsed MARC, profile, mapping parameters) for processing. Exports it to an appropriate Kafka queue, one event per MARC entry - DI_SRS_MARC_AUTHORITY_RECORD_CREATED
  6. + mod-inventory receives the event, maps the incoming SRS Authority record to Authority domain object using default mapping rules. Stores it in mod-inventory-storage sending HTTP request
  7. + mod-inventory-storage receives the incoming Authority domain object, saves it into a database to start reindexing, sends Authority domain object in Event to Kafka
  8. + mod-search receives the event and indexes the incoming Authority record according to the indexing mapping file

    Steps 6,7,8 are missing and need to be implemented

Implementation steps

Here is a list of steps, each step covers some area in a corresponding module. I added raw estimations to the steps that Spitfire can implement, other steps require the help of the Falcon team in estimating.

Separate searching for heading values (1xx) and auxiliary values (4xx 5xx)

In order to support separate search in heading values (1xx) only and over heading values (1xx) and auxiliary values (4xx 5xx) the 2 separate fields should be created in Inventory (and Elasticsearch) for each field, that user should be able to search in.