Page tree
Skip to end of metadata
Go to start of metadata

Problem Statement

When upgrading a FOLIO system by calling the Okapi tenant install API with a list of modules to upgrade, an operator may choose to specify the loadReference=true tenant parameter. This may be required, for example, for the tenant to take advantage of a new controlled vocabulary specified by a FOLIO SIG, or to get an update to an existing controlled vocabulary. As currently implemented in most modules, this will cause the module to attempt to load all reference data (not just new data). New records will be created if needed, and existing records (matched by UUID) will be overlaid.

This can lead to the following issues:

  • If the tenant has altered or deleted any of the reference data loaded by the module when it was first enabled (which is possible in some cases both using the reference UI and using the module APIs), any changes will be overwritten with the system defaults, and deleted records will be re-created.
  • If the reference data have data constraints (for example, the requirement that a particular property be unique), and the tenant has created a new reference record which causes a conflict with incoming reference data, the upgrade will fail, and the system will be left in an inconsistent state.

For more background, see: https://discuss.folio.org/t/reference-data-and-upgrades/2858

Examples

  • The reference data provided by mod-inventory-storage include labels in English. In order to provide labels in the local language, users must update the records (either in the UI or using the API). These labels are overwritten on upgrade.
  • The reference data provided by mod-users includes patron groups. It is possible to manage those data in the users setting UI (and in fact it is perfectly reasonable to want to customize patron groups). Patron groups have a unique constraint that on the "group" property, so if the tenant has created a patron group that conflicts with incoming reference data, the upgrade will fail.
  • The reference data provided by mod-inventory-storage includes a location hierarchy for Københavns Universitet. Locations can be managed in the tenant settings UI. It is very likely that a tenant will wish to remove the provided location hierarchy and create their own. Locations have unique constraints on the name and code. If the tenant creates a new location that conflicts with incoming data, the upgrade will fail.

Underlying issues:

We see this as a three-fold problem:

  1. The upgrade fails if there's a problem with loadReference=true , and leaves the system in an inconsistent state. Should not fail, should report errors and continue.
    1. Note: Worse for non-RMB modules, ignore loadReference parameter and do their own thing
  2. No way to get an output of what would be modified, the upgrade steps are all launched through Okapi fire-and-forget, so it fails or completes but no interactive component or ability to simulate. 
  3. There are no clear boundaries between the different types of data packaged with FOLIO, reference data vs. sample data. E.g. service points for DIKU tenant are loaded as reference data. A side effect is that tenants must edit the reference data according to local needs.
    1. This is a pitfall for upgrades, since modifications by users (which they are encouraged and enabled to through GUI) can be either breaking or being overwritten, so this is a potential data-loss scenario.

We do have UIs for editing Reference Data. Tenants are able, encouraged, and need to modify some of the reference data according to local circumstances. Upgrades should be able to preserve these changes while bringing in system-supplied updates.

Overall goal:

Upgrades have to be a fire-and-forget mechanism, without worrying about deleted custom data, outdated system data and so on.

The end result should not leave the system in an inconsistent or unusable state. Upgrades should continue unless failures encountered are truly fatal to the process and it cannot continue without a particular step succeeding.

Minor errors may happen and should be logged clearly for remediation by the system administrator.

Currently FOLIO delivers Reference Data and Sample Data, and these are not clearly distinguished. We would like to see:

  1. A more rigorous definition of reference data.
  2. The system should treat the base reference records as immutable, and introduce local updates as overlays on top of the base (like a customized view of the record for the tenant).

What follows is one possible model.

Proposal:

First, make a clear distinction between sample data and what we call reference data.

Second, split what we now call 'Reference data' into 3 layers or categories with different degrees of protection:

NameUsageOverwritten on upgradeImmutable (towards user)Example of data stored
System dataCore information the module can not work withoutYESYESPORT=9135; DEPENDENCY=mod_login, mod_permission
Reference dataPredefined users/groups, schemas, $STUFF to work withOn admin request (LoadReferenceData=true)

YES

schema_book{
  • title(string,128)
  • subtitle(string,256)
  • authors(string,128)
  • publisher(string,128)
  • release_date(integer,4)
  • blurb(string,32768)

}

Custom data

Can override any given reference & system data, as well as introducing new values

Whatever the user changes is stored here.


This layer will hold sample data, since it's essentially real data for a fictional tenant.

NO

(or only if the admin wishes to start with clean slate, like apt-get purge on Debian)


To load example data (like diku), introduce another switch (LoadExampleData=true)

No

schema_book{
  • uuid(integer,64)
  • subtitle(NA)
  • authors(string,256)
  • release_date(string,64)

}

PORT=10587

User: JohnDoe (Admin)

--- Result ---this is not a layer, just what the module works with in the endno layerno layer

PORT=10587; DEPENDENCY=mod_login, mod_permission

schema_book{

  • uuid(integer,64)
  • title(string,128)
  • authors(string,256)
  • publisher(string,128)
  • release_date(string,64)
  • blurb(string,32768)

}

User: JohnDoe (Admin)

System data is meant to consist only of information which causes the module to fail critically if not present. It may be altered by reference data or the custom overlay, but can never be not available, like a network port to use.

Reference data is a set of data that helps in using the module. Omitting it is possible, although basic functions of the module may not be available or working in an expected way (e. g. default user groups & permissions). 

Custom data is a layer which keeps all users changes. It is also host to any example data which is loaded (like diku) and may not be altered during upgrades (notable exception: The admin explicitely wishes to overwrite EVERYTHING changed by his users, which will empty the entire layer). Even once introduced example data is not updated during upgrades to prevent any changes to custom settings. 

This is introducing a transparent layered structure, in which the user will see data from all layers within the settings screen, and may alter everything there, although the changes will only be saved to the custom data layer. Modules should look for information in the following order, stopping at first hit:

  1. Custom data layer
  2. Reference data layer
  3. System data layer

Modules apply custom settings/data first, falling back to the reference settings/data in case of missing information and further falling back to system data if necessary. 

This way updating critical information and even reference data is possible without breaking the updating process, since both layers are untouched by users. Data which is stored in the custom layer may break some newly introduced features, making meaningful logging and error reporting towards the user necessary.



  • No labels

4 Comments

  1. I imagine:

    system data / reference data

    (= a minimal set of data needed to operate the system. Can not be changed, but may be overlaid)

    (immutable, not overwritten on upgrade; but will be changed (by the system) if an upgrade is made)

    examples:

    patron groups: faculty, staff, graduate, undergraduate

    pre-loaded resource types: text, sound, other

    1 example loan policy (load period: 60 days, grace period: 7 days, ...)

    Instance status types : catalogued, batch loaded, not catalogued

    a default location (e.g. "City Campus")

    default language, locale, time zone

    ...

    examples for system data that we do not even see in the UI:

    ... ?


    custom data / overlaid reference data

    (local modifications of reference data: deletions, overlays, additions)

    (not overwritten on upgrade; changeable by the user/tenant)

    (not necessary to run the system - but needed by the tenant!)

    Examples;

    Institution doesn't distinguish between grad and undergrad students, so it de-activtes ("deletes") "undergraduate" and renames (overlays) "graduate" to "student".

    Tenant adds additional resource types: cd, map, electronic .....

    Additional loan periods; standard loan period changed to 30 days

    Renaming the standard location, adding more locations

    Change system language to "Italian", change locale, time zone

    ...


    sample data

    large sets of data loaded only once, when the module is installed for the first time.

    Illustrativ examples to populate the data base.

    May be needed for tests; maybe even performance tests.

    Needed for demonstration purposes (showing the system to a new, potential client)

    Will be essentially deleted when the system becomes productive.

    Will not be upgraded. Can be changed by the tenant / user.

    Examples:

    . anonymized, sample user data ("Xenia Sample") with fictious names and addresse

    . large sets of inventory data (but not the real data which the tenant holds)

    . a larger set of material types

    . a larger set of contributor types

    . a larger set of identifier types

    ...

  2. I agree that there should be some sample data category. This is very useful for a development and testing point of view.


    I also want to point out that a big part of this is that the upgrade process should not fail, if the situation arises that any of the included reference or sample data changes or has been removed. This will involve some extra tooling or features from Okapi I believe, as these requests to enable new module versions for a tenant and load/use/upgrade whatever data, happens at the tenant API there.


    Lastly, if there is some desire to get away from the "fire and forget" method of upgrades, we need to make that clear. I am a fan of better visibility during the upgrade process. Some sort of validation of what is going to be run, or something declarative from Okapi saying "this is what is going to take place: foobar".

  3. Data type examples, for some context, that I propose:


    System data to include: A default Job profile needed for functionality from data-import, permissions for modules, service accounts for parts of the system to function/interact with other parts of the system (pub-sub)


    Reference data/overlayed data to include: Lookup tables


    Sample data to include: A default tenant and its locations, example patron groups, example loan policies, example circulation policies, example permission sets for staff/users of Folio


    Custom data to include: What tenant and locations/service points the customer/operator define, custom material types, organizational specific patron groups

  4. As sample data is just something that is introduced by a user of another library, it's already there - within the custom data layer:

    "Custom data is a layer which keeps all users changes. It is also host to any example data which is loaded (like diku)..."