Skip to end of metadata
Go to start of metadata

MODSOURCE-352 - Getting issue details... STATUS

Overview

This page is created with the purpose to define the technical approach to search and retrieve MARC authority records via MARC Authority app. The solution should be given considering the requirements listed below:

1) These results need to return via the UI.

Number of results returnedTime to retrieve all results and display results
1 million3 seconds
1 - 3 million 5 seconds
3 - 5 million

8 seconds

5 - 10 million10 seconds 
10 - 15 million12 seconds

2) Requirements for the HTTP response: 

  • Results should be paginated
  • Offset and limit parameters will determine pagination
  • Results should contain a JSON response with keys: records (array of associated instance UUIDS), recordCount (integer)
  • A search with no results should have an empty records array and count of 0

3) The search needs to support such options:

  • Keyword search
  • Fielded search
  • Phrase search
  • Boolean (AND/OR/NOT)
  • Exact phrase
  • Truncation
  • Wildcard

Reference page: Search MARC Authorities


Solution

Authority records are getting imported only into SRS in the current implementation of data-import. The SRS uses Postgres database to store the records. Most of the requirements, given above, include full-text search, are not supported by Postgres so efficiently as expected.

The solution is to use a search engine, that is designed to index and search data more productive. We will use the ElasticSearch. It’s able to achieve fast search responses because instead of searching the text directly, it searches an index. It uses a structure based on documents instead of tables and schemas and comes with extensive REST APIs for storing and searching the data.

ElasticSearch is already used in FOLIO. It stores Instances, Holdings, and Items. The mod-search provides a REST API to search records in CQL format. The SRS will still store originally imported Authority records (MARC + Parsed) in Postgres, but the Search API will be provided by the mod-search. Here is documentation that explains supported search types and options: https://github.com/folio-org/mod-search/blob/master/README.md#supported-search-types

We can organize storing and searching Authority records similar to Instances/Holdings/Items, having added more steps to the Authorities import workflow.


Authorities import workflow

  1. MARC file is uploaded from UI to mod-data-import
  2. MARC records are packed into batches and put to Kafka queue (EVENT - DI_RAW_RECORDS_CHUNK_READ)
  3. mod-srm reads batches from the queue, validates and passes to mod-srs via Kafka queue (EVENT - DI_RAW_RECORDS_CHUNK_PARSED)
  4. mod-srs stores Authority records into PostgreSQL database and returns the result back via Kafka queue (EVENT - DI_PARSED_RECORDS_CHUNK_SAVED)
  5. mod-srm reads the profile and creates JSON payload (containing parsed MARC, profile, mapping parameters) for processing. Exports it to an appropriate Kafka queue, one event per MARC entry - DI_SRS_MARC_AUTHORITY_RECORD_CREATED
  6. + mod-inventory receives the event, maps the incoming SRS Authority record to Authority domain object using default mapping rules. Stores it in mod-inventory-storage sending HTTP request
  7. + mod-inventory-storage receives the incoming Authority domain object, saves it into a database to start reindexing, sends Authority domain object in Event to Kafka
  8. + mod-search receives the event and indexes the incoming Authority record according to the indexing mapping file

    Steps 6,7,8 are missing and need to be implemented

Implementation steps

Here is a list of steps, each step covers some area in a corresponding module. I added raw estimations to the steps that Spitfire can implement, other steps require the help of the Falcon team in estimating.

  • Create schema for Authority domain object MODINV-504 - Getting issue details... STATUS
  • Create default Action profile and Mapping profile  MODDICORE-180 - Getting issue details... STATUS
  • Create mapping rules MODSOURMAN-573 - Getting issue details... STATUS
  • Create processor to generate Authority domain objects from Marc records by the mapping rules MODDICORE-181 - Getting issue details... STATUS
  • Create CRUD REST API to receive Authority domain object and save it into storage MODINVSTOR-787 - Getting issue details... STATUS
  • Create Handler to generate Authority domain objects from Marc records MODINV-501 - Getting issue details... STATUS
  • Handle an update of Authority record via mod-quick-marc MODINV-503 - Getting issue details... STATUS
  • Automatic creation of "authority" Kafka topic MODINVSTOR-788 - Getting issue details... STATUS
  • Sending of "Domain Events" to Kafka topic when Authority record is created/updated/deleted MODINVSTOR-789 - Getting issue details... STATUS
  • Add search API for Authority Records MSEARCH-195 - Getting issue details... STATUS
  • Add /authorities/ids API for Authority Records search  MSEARCH-197 - Getting issue details... STATUS
  • Refactor /index API to support Authority Records  MSEARCH-196 - Getting issue details... STATUS
  • Create Karate tests to cover Authority Records search in mod-search FAT-990 - Getting issue details... STATUS
  • Mod-search Authority Records search performance tests PERF-200 - Getting issue details... STATUS

Separate searching for heading values (1xx) and auxiliary values (4xx 5xx)

In order to support separate search in heading values (1xx) only and over heading values (1xx) and auxiliary values (4xx 5xx) the 2 separate fields should be created in Inventory (and Elasticsearch) for each field, that user should be able to search in.


  • No labels

11 Comments

  1. Igor Gorchakov and Mikhail Fokanov,

    • there seems to be a lot of modules that must be touched for these records which workflows performances should we monitor for slowness as a result 
    • This spike seems to focus on exposing this information to search. Where can one learn about implementing search capabilities such as Boolean, etc. 
    • Can estimates be provided on the effort to implement each one of the implementation steps? 
    • Compared to just using the SRS Search API, what are the benefits to this approach? What are the risks? If we went with SRS Search API how long would it take to implement? 
    • Which team will implement each one of the implementations steps?
    • IF both teams will be working on this feature have the teams developed a contract to define how the teams will work together? 
    • I believe we should have implementation step related to performance testing, thoughts?
    1. Hi Khalilah GambrellWhen you say "both teams" are you talking about Spitfire and Folijet? If yes, and this is for Lotus, then I'll need to know ASAP what work might be required of Folijet

      1. Ann-Marie Breaux, teams will/may be Falcon and Spitfire. Folijet should be off the hook.

  2. Igor Gorchakov and Mikhail Fokanov what is the benefit of introducing step 6 and 7?  It significantly increases complexity of the implementation as it will require defining data model to store authority records in mod-inventory-storage.  Also are the default mapping rules for the authority records already defined?

  3. Magda Zacharskamapping is not necessary for authority records. 

  4. Good question about steps 6-7, Magda Zacharska. Are the MARC Authority records interacting with MODINV or MODINVSTO in some way? Like Khalilah Gambrell, no mapping needed, since currently there is no Inventory UI surfacing Authority records.

  5. 9/22/2021 meeting notes

    Action items: 

    • Mikhail Fokanovand Igor Gorchakov will update Authorities import workflow and implementation steps 
    • Once updates are complete then we will have another meeting to confirm understanding 
    • Based on understanding either Falcon, Spitfire, or both will contact Prokopovych to discuss updates to mod-inventory modules
    • Falcon will discuss UXPROD-3130 - Getting issue details... STATUS and begin to create technical stories. 
  6. Ann-Marie Breaux  and Khalilah Gambrell: step 6 states:

    mod-inventory receives the event, maps the incoming SRSAuthority record to Authority domain object using default mapping rule

    my question referred to the mapping rule stated in this step.  Since the data be imported from SRS to mod-inventory-storage then some type of mapping will be required. 

  7. Igor Gorchakov which JIRA stories do you suggest Spitfire be assigned to implement? 

  8. Hi Khalilah Gambrell Magda Zacharskaand Igor GorchakovA lot of the workflow and implementation steps that you're describing for MOD-INV, MOD-INVSTOR, MOD-DICORE, MOD-DICONV, and MOD-SOURMAN mixes with work done by Folijet, and may have impact on Data Import UI or BE. I know that Spitfire and Folijet devs are meeting regularly to stay aligned. If possible, please be sure that a Folijet dev reviews PRs in these modules. We also want to be aware of any impact on the UI for Data Import, so that it can be factored into Folijet or Spitfire work, and so that the BE changes don't break the UI. (e.g. UIDATIMP-1021) Thank you!

    cc: Kateryna Senchenko Ivan Kryzhanovskyi

    1. Ann-Marie Breaux We actually assign the Folijet team to our PRs, maybe we missed a few PRs, but we would catch this error on UI, there is no UI team on our meeting and we don't assign UI team to back-end PRs. Anyway, we will assign Folijet team for every PRs related to MOD-INV, MOD-INVSTOR, MOD-DICORE, MOD-DICONV, and MOD-SOURMAN modules in the future.

      I am confused only about one point, you posted comment in the MODDICONV-209 - Getting issue details... STATUS about roll back this change. Why do we need to revert it if this work for Lotus release, and from the UI side, this development does not take much time?


      P.S. also we haven't meeting with Folijet team on previous week due to conflict with another one - "FOLIO - Optimistic Locking - Lotus testing plan discussion"


      cc: Khalilah Gambrell Oleksii Petrenko