Data Import UI and CLI experiences

This is a summary of experiences using FOLIO Data Import to do MARC record test loads through the data import ui and cli as of early June 2019. At this point, the cli is available in release 2.1; the ability to use the data import UI to do test loads is available via https://folio-snapshot-load.aws.indexdata.com/. This environment should be used instead of folio-testing or folio-snapshot or folio-snapshot-stable, especially when loading files with more that 10 or 20 records, so as not to compromised the performance of the other hosted reference environments. 

Data Import UI

Instructions and explanations for using the data import UI to test loading of MARC records can be found in the Data Import Temporary MARC Bib Load Button powerpoint presentation created by Ann-Marie Breaux, the data import product owner. If you wish to test loading MARC bib records through the UI, follow these steps:

Click on the data import icon in the top menu bar

  • Follow the instructions in the righthand pane to drag and drop a file of MARC records or select them the file using the 'or choose files' button. The file MUST have a .mrc or .marc extension
  • Click the caret (^) next to the word Files in the upper lefthand corner and click the resulting label "Load MARC bibliographic records".

At this point, you should review the Data Import Temporary MARC Bib Load Button powerpoint for explanations and next steps.

Data Import Command Line Interface (cli)

The instructions for using the cli to the data import tool can be found at: https://github.com/folio-org/mod-source-record-manager. The relevant section begins on that page at Data Import Workflow

Some points to be aware of when using the cli:

  • The cli will accept 3 types of input: RAW MARC, MARC XML, and MARC JSON. The interface expects that all record input will consist of properly-escaped json. The Free Online JSON Escape / Unescape Tool can be used to escape input records.
  • Multiple records must be enclosed in quotes (") and the closing quote must be followed by a comma (,) except following the last record, in order to serialize the records as an array of JSON strings

Here are some sample files to show what the formatted records should look like:

Performance considerations if attempting a large file load via the CLI:

The modules which ingest and process need to be given a greater amount of Java heap memory resources than the Index Data default of “-Xmx256m” - as what’s generally set in the hosted "testing" and "snapshot" environments.To avoid crashing the modules in a production-ready Folio system during a record load of 50k+, it was necessary to set the Java heap memory to “-Xmx4096m” for both mod-source-record manager and mod-source-record-storage. It was also useful to set container limits, so the load does not run-away on the system, causing the entire Folio deployment to become unresponsive by failing a host/node. Texas A&M's Folio Q2.1 2019 instance is running on a K8s/Rancher cluster, hosting in total three Folio deployments, as well as a module descriptor registry deployment. Each of Texas A&M's 8 nodes has a 4-core CPU and 16GB of memory. Its Okapi and Folio Module Postgres databases are separated - to avoid UI failures of requests during heavy data loading.

Experiences

Log of various test loads: record_update_testing_log.xlsx