FOLIO Wiki

The Problem (as stated in https://discuss.folio.org/t/reference-data-and-upgrades/2858, see also SYSOPS/Upgrades with Reference data)

A FOLIO upgrade, as it is currently implemented, involves replacing the set of modules enabled for a tenant with a new set of modules. Each module is responsible for upgrading its storage for the tenant in place – for example, updating existing records with new required fields, or adding a database index to improve performance.

A new version of a module (or a new module that was not previously included in the tenant’s module set) might contain new or updated reference data. It seems reasonable that an operator might choose to specify loadReference=true in the call to the tenant install API to load the new reference data.

As currently implemented in most modules, this will cause the module to attempt to load all reference data (not just new data). New records will be created if needed, and existing records (matched by UUID) will be overlaid.

Due to this, issues arise if the tenant has altered or deleted any of the reference data loaded by the module when it was first enabled. Any changes will of course be overwritten with the system default, and deleted records will be re-created.

More subtle problems arise if the record type in question has data constraints (for example, the requirement that a particular property be unique), and the tenant has created a new record of that type which causes a conflict with incoming reference data. As currently implemented, this kind of conflict causes the module upgrade to fail, potentially leaving the tenant data in an inconsistent state.

These kinds of issues would very likely also arise if an operator specified loadSample=true in an upgrade, but that is currently untested, and seems like an unlikely use case, at least for production.

Desired Upgrade Behaviors

The upgrade process should leave the system in a usable state unless a truly fatal error is encountered.
1. Rigorous error handling in upgrade scripts to ensure trivial errors do not derail process
2. When fatal errors occur, the output should clearly indicate where the error occurred.
The upgrade process should produce a log output of changes made.
The upgrade process should be able to run in a simulated mode to facilitate planning.
The upgrade process should account for the presence of non-system data and preserve it entirely.

Proposal One

Eliminate Reference Data as a category.
Create a new category called "System Data" which holds all schema data and records currently loaded by specifying loadReference=true.
- Make this category immutable towards the system operator/users.

Create a category called "Overlay Data" which is changeable by system operator/users and is used to modify values or schema in "System Data".
- This will allow the system to avoid overwriting user-specified values when performing system upgrades.

Proposed Data Types and Definitions for Proposal One

Name	Definition/notes	Overwritten on upgrade	Immutable (towards system operator)	Example of data stored
System Data	Data necessary for operation of the system. These should be values that are immutable toward the user or system operator. Example: sane defaults for field labels. Note: at present, these sort of default values for certain modules are only loaded when you specify LoadReferenceData=true on module initialization. This proposes we move those into an immutable category.	YES	YES	schema_book{ title(string,128) subtitle(string,256) authors(string,128) publisher(string,128) release_date(integer,4) blurb(string,32768) }
Overlay Data	Data that may be used to supersede System Data when a user or system operator wants to change something immutable Example: a default field label. Can be used to override any System Data, as well as introducing new values.	NO	NO	schema_book{ uuid(integer,64) subtitle(NA) authors(string,256) release_date(string,64) }
Sample Data/User Data	Data that can be used to demo the system or is useful for providing examples to users. Data entered by users specific to an institution. This data is not necessary for the operation of the system. Example: a user record for a fake (or real) patron. This layer also holds sample data, since it's essentially real data for a fictional tenant.	NO To load example data (like diku), introduce another switch (LoadExampleData=true)	NO	book: uuid: asdf034 title: 1984 authors: George Orwell publisher: Secker & Warburg release date: June 8, 1949 blurb: Nineteen Eighty-Four: A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell. It was published on 8 June 1949 by Secker & Warburg as Orwell's ninth and final book completed in his lifetime.
--- Result ---				schema_book{ uuid(integer,64) title(string,128) authors(string,256) publisher(string,128) release_date(string,64) blurb(string,32768) } book: uuid: asdf034 title: 1984 authors: George Orwell publisher: Secker & Warburg release date: June 8, 1949 blurb: Nineteen Eighty-Four: A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell. It was published on 8 June 1949 by Secker & Warburg as Orwell's ninth and final book completed in his lifetime.

Proposal Two

Create a data layer on top of reference data that allows users to overlay system-provided values with local values.

Proposed Data Types and Definitions for Proposal Two

Data type	Notes	Behavior on module upgrade	Examples
System data	Data that are necessary for operation of the system. These should be values that are immutable toward the user or system operator (but may be visible in the UI as a value list, for example).	Overwrite	Inventory item statuses
Reference data	Data that are referred to by other records in the system, which may be optionally loaded on module initialization using the loadReference tenant parameter.	Overlay	User address types Inventory controlled vocabularies
User/sample data	Data that are created by the user, or loaded using the loadSample tenant parameter.	Upgrade	Users Inventory instances