- MODSOURCE-352Getting issue details... STATUS
Overview
This page is created with the purpose to define the technical approach to search and retrieve MARC authority records via MARC Authority app. The solution should be given considering the requirements listed below:
1) These results need to return via the UI.
Number of results returned | Time to retrieve all results and display results |
---|---|
1 million | 3 seconds |
1 - 3 million | 5 seconds |
3 - 5 million | 8 seconds |
5 - 10 million | 10 seconds |
10 - 15 million | 12 seconds |
2) Requirements for the HTTP response:
- Results should be paginated
- Offset and limit parameters will determine pagination
- Results should contain a JSON response with keys: records (array of associated instance UUIDS), recordCount (integer)
- A search with no results should have an empty records array and count of 0
3) The search needs to support such options:
- Keyword search
- Fielded search
- Phrase search
- Boolean (AND/OR/NOT)
- Exact phrase
- Truncation
- Wildcard
Reference page: Search MARC Authorities
Solution
Authority records are getting imported only into SRS in the current implementation of data-import. The SRS uses Postgres database to store the records. Most of the requirements, given above, include full-text search, are not supported by Postgres so efficiently as expected.
The solution is to use a search engine, that is designed to index and search data more productive. We will use the ElasticSearch. It’s able to achieve fast search responses because instead of searching the text directly, it searches an index. It uses a structure based on documents instead of tables and schemas and comes with extensive REST APIs for storing and searching the data.
ElasticSearch is already used in FOLIO. It stores Instances, Holdings, and Items. The mod-search provides a REST API to search records in CQL format. The SRS will still store originally imported Authority records (MARC + Parsed) in Postgres, but the Search API will be provided by the mod-search. Here is documentation that explains supported search types and options: https://github.com/folio-org/mod-search/blob/master/README.md#supported-search-types
We can organize storing and searching Authority records similar to Instances/Holdings/Items, having added more steps to the Authorities import workflow.
Authorities import workflow
- MARC file is uploaded from UI to mod-data-import
- MARC records are packed into batches and put to Kafka queue (EVENT - DI_RAW_RECORDS_CHUNK_READ)
- mod-srm reads batches from the queue, validates and passes to mod-srs via Kafka queue (EVENT - DI_RAW_RECORDS_CHUNK_PARSED)
- mod-srs stores Authority records into PostgreSQL database and returns the result back via Kafka queue (EVENT - DI_PARSED_RECORDS_CHUNK_SAVED)
- mod-srm reads the profile and creates JSON payload (containing parsed MARC, profile, mapping parameters) for processing. Exports it to an appropriate Kafka queue, one event per MARC entry - DI_SRS_MARC_AUTHORITY_RECORD_CREATED
- + mod-inventory receives the event, maps the incoming SRS Authority record to Authority domain object using default mapping rules. Stores it in mod-inventory-storage sending HTTP request
- + mod-inventory-storage receives the incoming Authority domain object, saves it into a database to start reindexing, sends Authority domain object in Event to Kafka
- + mod-search receives the event and indexes the incoming Authority record according to the indexing mapping file
Steps 6,7,8 are missing and need to be implemented
Implementation steps
Here is a list of steps, each step covers some area in a corresponding module. I added raw estimations to the steps that Spitfire can implement, other steps require the help of the Falcon team in estimating.
- Create schema for Authority domain object - MODINV-504Getting issue details... STATUS
- Create default Action profile and Mapping profile - MODDICORE-180Getting issue details... STATUS
- Create mapping rules - MODSOURMAN-573Getting issue details... STATUS
- Create processor to generate Authority domain objects from Marc records by the mapping rules - MODDICORE-181Getting issue details... STATUS
- Create CRUD REST API to receive Authority domain object and save it into storage - MODINVSTOR-787Getting issue details... STATUS
- Create Handler to generate Authority domain objects from Marc records - MODINV-501Getting issue details... STATUS
- Handle an update of Authority record via mod-quick-marc - MODINV-503Getting issue details... STATUS
- Automatic creation of "authority" Kafka topic - MODINVSTOR-788Getting issue details... STATUS
- Sending of "Domain Events" to Kafka topic when Authority record is created/updated/deleted - MODINVSTOR-789Getting issue details... STATUS
- Add search API for Authority Records - MSEARCH-195Getting issue details... STATUS
- Add /authorities/ids API for Authority Records search - MSEARCH-197Getting issue details... STATUS
- Refactor /index API to support Authority Records - MSEARCH-196Getting issue details... STATUS
- Create Karate tests to cover Authority Records search in mod-search - FAT-990Getting issue details... STATUS
- Mod-search Authority Records search performance tests - PERF-200Getting issue details... STATUS
Separate searching for heading values (1xx) and auxiliary values (4xx 5xx)
In order to support separate search in heading values (1xx) only and over heading values (1xx) and auxiliary values (4xx 5xx) the 2 separate fields should be created in Inventory (and Elasticsearch) for each field, that user should be able to search in.
11 Comments
Khalilah Gambrell
Igor Gorchakov and Mikhail Fokanov,
Ann-Marie Breaux
Hi Khalilah GambrellWhen you say "both teams" are you talking about Spitfire and Folijet? If yes, and this is for Lotus, then I'll need to know ASAP what work might be required of Folijet
Khalilah Gambrell
Ann-Marie Breaux, teams will/may be Falcon and Spitfire. Folijet should be off the hook.
Magda Zacharska
Igor Gorchakov and Mikhail Fokanov what is the benefit of introducing step 6 and 7? It significantly increases complexity of the implementation as it will require defining data model to store authority records in mod-inventory-storage. Also are the default mapping rules for the authority records already defined?
Khalilah Gambrell
Magda Zacharskamapping is not necessary for authority records.
Ann-Marie Breaux
Good question about steps 6-7, Magda Zacharska. Are the MARC Authority records interacting with MODINV or MODINVSTO in some way? Like Khalilah Gambrell, no mapping needed, since currently there is no Inventory UI surfacing Authority records.
Khalilah Gambrell
9/22/2021 meeting notes
Action items:
Magda Zacharska
Ann-Marie Breaux and Khalilah Gambrell: step 6 states:
my question referred to the mapping rule stated in this step. Since the data be imported from SRS to mod-inventory-storage then some type of mapping will be required.
Khalilah Gambrell
Igor Gorchakov which JIRA stories do you suggest Spitfire be assigned to implement?
Ann-Marie Breaux
Hi Khalilah Gambrell Magda Zacharskaand Igor GorchakovA lot of the workflow and implementation steps that you're describing for MOD-INV, MOD-INVSTOR, MOD-DICORE, MOD-DICONV, and MOD-SOURMAN mixes with work done by Folijet, and may have impact on Data Import UI or BE. I know that Spitfire and Folijet devs are meeting regularly to stay aligned. If possible, please be sure that a Folijet dev reviews PRs in these modules. We also want to be aware of any impact on the UI for Data Import, so that it can be factored into Folijet or Spitfire work, and so that the BE changes don't break the UI. (e.g. UIDATIMP-1021) Thank you!
cc: Kateryna Senchenko Ivan Kryzhanovskyi
Oleksandr Dekin
Ann-Marie Breaux We actually assign the Folijet team to our PRs, maybe we missed a few PRs, but we would catch this error on UI, there is no UI team on our meeting and we don't assign UI team to back-end PRs. Anyway, we will assign Folijet team for every PRs related to MOD-INV, MOD-INVSTOR, MOD-DICORE, MOD-DICONV, and MOD-SOURMAN modules in the future.
I am confused only about one point, you posted comment in the MODDICONV-209 - Getting issue details... STATUS about roll back this change. Why do we need to revert it if this work for Lotus release, and from the UI side, this development does not take much time?
P.S. also we haven't meeting with Folijet team on previous week due to conflict with another one - "FOLIO - Optimistic Locking - Lotus testing plan discussion"
cc: Khalilah Gambrell Oleksii Petrenko