2018-05-14 - Data Migration Subgroup Agenda and Notes

Date

Attendees

Goals

Discussion items

TimeItemWhoNotes
WelcomeTod Olson convened the meeting
  • request a separate email list for the subgroup - Wayne Schneider took this as an action item

Marc Batch Loader Subgroup

A new subgroup formed: https://discuss.folio.org/t/marc-batch-loads-into-folio-new-subgroup/1792

  • participation of SysOps ? - Wayne Schneider will be on the subgroup
  • relevance for SysOps / Data Migration ?
  • other contributions of SysOps SIG - Testing ?

Discussion: there seems to be enough representation on the subgroup to suit the needs of the migration/sys-ops group. If communication is needed, Wayne will be "keeping an ear out." Any information and questions can be sent to Ann-Marie Breaux (Deactivated),

  Migration Tools 
  • technical gaps in the migration tools
  • tools needed to be developed
  • have a common set of tools from the community to do the migration

Discussion: this topic was mixed with a review of WOLFcon. Wayne recapped the session he led that laid out requirements for a useful tool that could load different kinds of data. At the meeting, the group described a tool that could load lots of different kinds of data into the relative JSON storage format. In addition, the loader would be able to handle MARCXML and the MARC Format for Holdings Data for inventory records, which include instance, holding and item. A requirement of the tool might include managing storage. For example, couple manage UUID → legacy ID mapping.

Dale would like the functions of MARC loading and managing storage in FOLIO to be optional.

We discussed the need for Sys-Ops to have a product owner who could take this requirement and express it as a JIRA ticket so it could be evaluated and given developer time. Tod agreed to write a proposal for a PO to be reviewed at Thursday's meeting; Sharon Beltaine will share the Reporting SIG's request with Tod.

Wayne had several ideas about the loader; he thinks it should be a separate FOLIO module, and that a bulk load API might be designed to work for both migration and onging loads. Data migration requires very high performance, more error checking, and you are not loading into a live system. The current API, which is a one-at-a-time way to post JSON, is very slow; it took U Chi 16 hours to load 70,000 users.


Test Data

Discussion: The goal is to provide something resembling data that looks and acts real. Dale said we should be using real data in CI/CL and performance testing. Patty said that EBSCO intends to set up a demo system with a fictional university library with various scenarios and real-looking data. Perhaps other early implementers will want a similar sandbox.

The test data needs to include

1.) production-type data

2.) Engineered edge cases.


Action items