Page tree
Skip to end of metadata
Go to start of metadata

The Problem (as stated in https://discuss.folio.org/t/reference-data-and-upgrades/2858)

A FOLIO upgrade, as it is currently implemented, involves replacing the set of modules enabled for a tenant with a new set of modules. Each module is responsible for upgrading its storage for the tenant in place – for example, updating existing records with new required fields, or adding a database index to improve performance.

A new version of a module (or a new module that was not previously included in the tenant’s module set) might contain new or updated reference data. It seems reasonable that an operator might choose to specify loadReference=true in the call to the tenant install API to load the new reference data.

As currently implemented in most modules, this will cause the module to attempt to load all reference data (not just new data). New records will be created if needed, and existing records (matched by UUID) will be overlaid.

Due to this, issues arise if the tenant has altered or deleted any of the reference data loaded by the module when it was first enabled. Any changes will of course be overwritten with the system default, and deleted records will be re-created.

More subtle problems arise if the record type in question has data constraints (for example, the requirement that a particular property be unique), and the tenant has created a new record of that type which causes a conflict with incoming reference data. As currently implemented, this kind of conflict causes the module upgrade to fail, potentially leaving the tenant data in an inconsistent state.

These kinds of issues would very likely also arise if an operator specified loadSample=true in an upgrade, but that is currently untested, and seems like an unlikely use case, at least for production.

 Desired Upgrade Behaviors

  1. The upgrade process should leave the system in a usable state unless a truly fatal error is encountered.
    1. Rigorous error handling in upgrade scripts to ensure trivial errors do not derail process
    2. When fatal errors occur, the output should clearly indicate where the error occurred.
  2. The upgrade process should produce a log output of changes made.
  3. The upgrade process should be able to run in a simulated mode to facilitate planning.
  4. The upgrade process should account for the presence of non-system data and preserve it entirely.

Proposal One

  • Eliminate Reference Data as a category.
  • Create a new category called "System Data" which holds all schema data and records currently loaded by specifying loadReference=true.
    • Make this category immutable towards the system operator/users.
  • Create a category called "Overlay Data" which is changeable by system operator/users and is used to modify values or schema in "System Data".
    • This will allow the system to avoid overwriting user-specified values when performing system upgrades.


Proposed Data Types and Definitions for Proposal One


NameDefinition/notesOverwritten on upgradeImmutable (towards system operator)Example of data stored
System Data

Data necessary for operation of the system. These should be values that are immutable toward the user or system operator. Example: sane defaults for field labels.

Note: at present, these sort of default values for certain modules are only loaded when you specify LoadReferenceData=true on module initialization. This proposes we move those into an immutable category.

YESYESschema_book{
  • title(string,128)
  • subtitle(string,256)
  • authors(string,128)
  • publisher(string,128)
  • release_date(integer,4)
  • blurb(string,32768)

}

Overlay Data

Data that may be used to supersede System Data when a user or system operator wants to change something immutable Example: a default field label.

Can be used to override any System Data, as well as introducing new values.

NO

NO

schema_book{
  • uuid(integer,64)
  • subtitle(NA)
  • authors(string,256)
  • release_date(string,64)

}

Sample Data/User Data

Data that can be used to demo the system or is useful for providing examples to users.

Data entered by users specific to an institution.

This data is not necessary for the operation of the system. Example: a user record for a fake (or real) patron.

This layer also holds sample data, since it's essentially real data for a fictional tenant.

NO

To load example data (like diku), introduce another switch (LoadExampleData=true)

NO

book:

uuid: asdf034

title: 1984

authors: George Orwell

publisher: Secker & Warburg

release date: June 8, 1949

blurb: Nineteen Eighty-Four: A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell. It was published on 8 June 1949 by Secker & Warburg as Orwell's ninth and final book completed in his lifetime.

--- Result ---


schema_book{

  • uuid(integer,64)
  • title(string,128)
  • authors(string,256)
  • publisher(string,128)
  • release_date(string,64)
  • blurb(string,32768)

}

book:

uuid: asdf034

title: 1984

authors: George Orwell

publisher: Secker & Warburg

release date: June 8, 1949

blurb: Nineteen Eighty-Four: A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell. It was published on 8 June 1949 by Secker & Warburg as Orwell's ninth and final book completed in his lifetime.



Proposal Two

  • Create a data layer on top of reference data that allows users to overlay system-provided values with local values.

Proposed Data Types and Definitions for Proposal Two


Data type

Notes

Behavior on module upgrade

Examples

System 

data

Data that are necessary for operation of the system. These should be values that are immutable toward the user or system operator (but may be visible in the UI as a value list, for example).

Overwrite

Inventory item statuses

Reference data

Data that are referred to by other records in the system, which may be optionally loaded on module initialization using the loadReference tenant parameter.

Overlay

User address types

Inventory controlled vocabularies

User/sample data

Data that are created by the user, or loaded using the loadSample tenant parameter.

Upgrade

Users

Inventory instances

  • No labels

11 Comments

  1. Wayne Schneider Stephen Pampell Guys, in the Proposal Two, for the Reference data type, the proposed behavior is Overlay – can you provide an example of the expected behavior for, say, Material Types? Assuming Inventory v1.0.0 is installed, what should happen when:

    1) new material type is added (e.g "Ultra HD Blu-ray") with version 1.1.0

    a) users did not modify existing MTs

    b) users added an item with similar purpose (e.g "UHD Blu-Ray")

    2) existing material type is removed (e.g "HD DVD") with version 1.2.0

    a) and it is not in use in the system

    b) it is in use in the system

    3) existing material type is modified (e.g "BD-ROM" → "Blu-ray") with version 1.3.0

    a) it is not in use in the system

    b) it is in use in the system

    1. Jakub Skoczen my expectations are below, they would need to be verified with SMEs. In all cases, I would expect any changes to reference data to be documented in the release notes (which I am sure does not happen now). I added a few more things to consider:

      1. new material type is added (e.g "Ultra HD Blu-ray") with version 1.1.0
        1. users did not modify existing MTs Add new MT
        2. users added an item with similar purpose (e.g "UHD Blu-Ray") If the similar purpose is purely semantic (e.g., matching label), add new MT, the user can decide what to do. If the similar MT and the new MT cause a constraint violation (e.g., the record type has a "code" property with a uniqueness constraint), the new MT would fail to load and an error would be logged (but the upgrade should not abort)
      2. existing material type is removed (e.g "HD DVD") with version 1.2.0
        1. and it is not in use in the system Remove the MT
        2. it is in use in the system Fail to remove the MT and log an error (but do not abort)
      3. existing material type is modified (e.g "BD-ROM" → "Blu-ray") with version 1.3.0 Assuming here we are talking about things like label updates, not new properties or enum-controlled properties. New properties should be added. Enum-controlled properties are a little trickier, if they can be user-modified.
        1. it is not in use in the system Update the MT unless the user has updated the modified property
        2. it is in use in the system Update the MT unless the user has updated the modified property
        3. it has been deleted by the user Do not load MT
      4. existing material type is not modified by upgrade
        1. user has not modified MT No update necessary
        2. user has modified MT Do not overwrite user-updated properties
        3. user has deleted MT Do not load MT
      1. Wayne Schneider

        and it is not in use in the system Remove the MT

        it is in use in the system Fail to remove the MT and log an error (but do not abort)

        Does that mean that the post condition of an upgrade that is intended to delete a record is that it either was deleted or it wasn't? And this means that the system cannot rely on the record having been removed when put back into use (or the next upgrade is performed)?

        1. Marc Johnson It seems to me as an operator that if a record can be "in use" (that is, is referred to by other records in the system), an upgrade that proposes to remove it must take that into account. There are certainly multiple ways to deal with that situation. For example, we could decide that upgrades cannot delete reference data at all. We could deprecate obsolete reference data, leaving it in use with a warning to the operator. We could allow for deletion, but fail if a record is "in use" and log an error or warning, as I proposed above. We could allow for deletion with some kind of clean up of related records and log an error or warning. The question of whether the system can rely on the record having been removed is a function of how we choose to deal with data that are in-use.

          It may be that a hard and fast policy on this is not necessary, as long as behavior is well-documented. It likely depends on the domain and the particular categories that any given reference data represent. Input from SMEs seems critical here.

          1. It seems to me as an operator that if a record can be "in use" (that is, is referred to by other records in the system), an upgrade that proposes to remove it must take that into account.

            Agreed

            We could allow for deletion, but fail if a record is "in use" and log an error or warning, as I proposed above.

            By failure, are you referring to the idea that each step could fail. Any of those failures might be tolerable, and not be considered an overall failure.

            The question of whether the system can rely on the record having been removed is a function of how we choose to deal with data that are in-use.

            Agreed. I think what we decide may have significant implications for the use of the system following an upgrade. The system may need to be aware that steps it expected to occur, have not, and compensate appropriately.

            Input from SMEs seems critical here.

            Couldn't agree more.

            1. By failure, are you referring to the idea that each step could fail. Any of those failures might be tolerable, and not be considered an overall failure.

              Yes, exactly, thank you for clarifying.

          2. It seems to me as an operator that if a record can be "in use" (that is, is referred to by other records in the system), an upgrade that proposes to remove it must take that into account.

            What do you consider to be in-use? Referenced by records within the same module or anywhere in the system as a whole?

            We could deprecate obsolete reference data, leaving it in use with a warning to the operator

            The operator being the person using the UI for maintaining this reference data, a client to the API for maintaining this reference data or the person operationally responsible for the module upgrade?

            It may be that a hard and fast policy on this is not necessary, as long as behavior is well-documented

            If it is actively allowed to vary between sets of reference records, that suggests it will need to be documented on a per set basis?

    2. Jakub Skoczen Thanks for bringing up an real life example. I completely agree with Wayne, any changes done by the script should be documented.

      I wonder in wich cases a library will set the loadReference parameter. Thinking from the library perspective in a production system, for this example data usually no changes would be requested and the parameter will be not set?

      1. Marko Knepper in my mind, the use case for setting loadReference=true  on an upgrade is if there are updates to a controlled vocabulary in use by the library, e.g. various terms from RDA, or if a module introduces a new set of reference data to support a new feature, e.g. a new property in a record schema.

  2. Laura Wright Christie Thomas Jason Kovari - I think it will be a good idea to talk through/review these two proposals with the MM-SIG. In Settings > Inventory we right now have e.g. non editable system data (formerly known as reference data), e.g. RDA terms for Resource type, and Format, as well as lists of system data which are all editable, e.g. Statistical code types, Instance status types etc.

    Whether a list of system data is maintained in Settings > Inventory or in the Entity Management app might not make any difference compared when discussing these two proposals - but I could be wrong here. 

  3. Charlotte Whitt I do think we should discuss inventory controlled vocabularies. Currently there is a challenge maintaining external data within FOLIO as reference data over time as I believe that the overlay is based on the UUID which may differ from tenant to tenant. If external identifiers were used for storing and managing these entities in FOLIO, this would not be a problem. I also believe that there is a problem when local terms are added to enrich an external vocabulary that interferes  with the update / overlay process, but I may be mistaken about that.