R3 2021 (Kiwi) Data Import improvements


Accompanies UXPROD-3023 and MODDATAIMP-412

P2 priority are red; P3 are green

Motivation

At the moment, the data import functionality has been transferred to a new solution that uses KAFKA as a direct transport. This made it possible to solve the main problems associated with the limitations of the HTTP protocol for data transfer and to implement a message delivery queue. A large amount of data in the payload has significantly increased the usage of KAFKA's disk storage. Also, the current implementation of calculating progress uses a single mechanism displaying progress on the UI, which leads to a slowdown in data import.

Goal

To improve the stability and speed of data import, several main goals need to be achieved. Reduce the size of the payload in messages. Separate the mechanism for calculating progress and displaying it on the UI. Add KAFKA error handling and write this to the log and include it in progress counting.

Main steps

Reduce the size of the payload

As part of this improvement, removing all objects from the dataImportEventPayload context that does not change during data processing. MappingRules, MappingParams, JobProfileSnapshot objects should not be a part of the KAFKA message. In turn, each module will receive data via the HTTP protocol from SRM and cache it locally under the jobExecutionId key.

To implement this solution, we need to implement an API and storage in SRM that will allow you to receive this data. We should load, and take needed values ​​from the cache in other modules involved in data import. Also, we need to move payload zipping functionality to the KAFKA side, to reduce CPU usage by modules

StoryEstimation
MODSOURMAN-463 Create storage and API for MappingRules and MappingParams3
MODSOURMAN-464 Store snapshots of MappingRules and MappingParams to the database2
MODSOURMAN-465 Remove MappingRules, MappingParams, and JobProfileSnapshot from the event payload1
MODSOURMAN-466 Remove zipping mechanism for data import event payloads2
MODSOURCE-286 Remove zipping mechanism for data import event payloads and use cache for params5
MODINV-405 Remove zipping mechanism for data import event payloads and use cache for params5
MODINVOICE-251 Remove zipping mechanism for data import event payloads and use cache for params2

Separate the mechanism for calculating progress and displaying it on the UI

To ensure more stable application's work under load and with a large number of users, it is necessary to revise the mechanism for calculating the progress of work. The best solution for this would be to add a new API to support the landing page UI and send lightweight DTOs, rather than the full job execution objects that are involved in the data import process. New objects will be stored separately from job execution and updated in the background. Adding indexes will help to reduce size of logs, plus help with log sorting and retrieval. This will also help when multiple users open the landing page.

StoryEstimation
MODSOURMAN-468 Create a new API and database table that should store and represent information for the Data-Import landing page.5
MODSOURMAN-469 Change data-import progress mechanism with a new plain DB table counter and background job8
UIDATIMP-918 Use new API for DataImport landing page3

KAFKA error handling

To ensure a more stable operation of the data import function when using KAFKA, it is necessary to work out in more detail the handling of errors and the application's response to them. To do this, we need to make changes to the general library Folio-Kafka-wrapper and the modules that use it. All errors should be logged and included in the data-import journal. It is also necessary to increase the test coverage of this library and cover edge cases. This should help to keep jobs from getting stuck; they should either complete, complete with errors, or fail.

StoryDraft estimation
MODPUBSUB-167 Reconsider error handling in KafkaConsumerWrapper8
MODPUBSUB-168 Cover with tests folio-kafka-wrapper5
MODSOURMAN-474 Implement ProcessRecordErrorHandler for Kafka Consumers5
MODSOURCE-290 Implement ProcessRecordErrorHandler for Kafka Consumers3
MODINVOICE-252 Implement ProcessRecordErrorHandler for Kafka Consumers2
MODINV-408 Implement ProcessRecordErrorHandler for Kafka Consumers5


Kateryna Senchenko Vladimir Shalaev Ann-Marie Breaux (Deactivated) please review this document