Skip to end of metadata
Go to start of metadata

FOLIO primarily has two ways to represent the choice of values from a set in back end APIs: enumerations and reference records (there are some exceptions, including unconstrained strings and hard coded sets).

Context

Historically, it seems that properties used primarily as information for people (or for dynamic policies) e.g. instance status or material type have been represented as reference records and states or types that the system needs to interpret semantically as enumerations e.g. item status, order type, order workflow status.

Recently, there has been increasing interest in representing the latter as reference records as well. This document is intended to outline the considerations for such an approach.

Challenges

How should FOLIO define a set of values which can be changed by a tenant over time and associate behaviour of the system to specific members of the set?

Should FOLIO follow a single convention for modelling sets of values?

Design

Enumerations

Enumerations are defined directly with the property that they constrain, for example, this is the definition for the possible item status names:

Item Status Names
{
  "name":{
    "description":"Name of the status e.g. Available, Checked out, In transit",
    "type":"string",
    "enum":[
      "Available",
      "Awaiting pickup",
      "Awaiting delivery",
      "Checked out",
      "In process",
      "In transit",
      "Missing",
      "On order",
      "Paged",
      "Declared lost",
      "Order closed",
      "Claimed returned",
      "Unknown",
      "Withdrawn",
      "Lost and paid",
      "Aged to lost"
    ]
  }
}


Reference Records

Use of reference records is made up of two parts:

  • the set of reference records themselves (with their own API)
  • a property in the record that refers to a member of the reference records by ID

Instance type is an example of this, the API for the reference records is defined here and the referring property in an instance is defined here. There is a corresponding database table and foreign key respectively.

Characteristics

The two approaches have very different characteristics, particular in where they are defined, how they are referred to and how they can change.

Enumerations

  • Are referred to by name
  • Are defined by an interface
  • Members are fixed for any given module version
  • Clients cannot dynamically get the members of set
  • Are only a name, cannot have other descriptive properties associated with them
  • Can only be referenced by a single property

Reference Records

  • Are referred to by name
  • Are defined by the implementation
  • Members may change independently of module version
  • Clients can dynamically get the members of the set
  • Can have additional descriptive properties (beyond the name) associated with them
  • Can be referenced by multiple properties

Comparison


Defined byReferred to byStability of valuesDescriptive properties
EnumerationsinterfacenameStatic/fixedNo
Reference Recordsimplementationiddynamic/can changeYes


Worked Example - Holdings Source

This feature is being worked on at present by the Core Functional team in preparation for the FoliJet team to disallow editing of holdings that are based upon imported MARC files.

Behaviour

  • The source is assigned by different processes
    • For holdings created during data import, the source should be MARC
    • For holdings created any other way, the source should be FOLIO
  • Can filter instances by whether there it has holdings with that source
  • The source dictates whether holdings can be edited
    • Holdings with a MARC source cannot be edited directly
    • Otherwise holdings can be edited directly
  • Tenants may want to define their own sources in the future

Comparison

Below is a comparison of how the characteristics affect this (and related behaviour) behaviour.


SituationEnumerationReference Records
When importing a MARC file, how does data import know the correct source to use?Has to be hardcoded based upon the dependent interface versionNeeds to be identified during the process (and might not exist)

When creating a record, how does the reference UI know the correct source to use?


Has to be hardcoded based upon the dependent interface versionNeeds to be identified during the process (and might not exist)
When presenting the search options, how does the reference UI know what source values can be used?Has to be hardcoded based upon the dependent interface versionCan be fetched via the API entirely at runtime
How does the reference UI know whether an holdings record is editable Sources that are editable have to be hardcoded based upon the dependent interface versionClients need a way of identifying which source are editable
How can a tenant add a new source?They cannot, sources can only be added for all implementations via the interfaceVia the API (or possible a settings page in the reference UI)
If a new source is added, how does the system know if holdings with that source are editable?As the list is fixed, is decided during developmentClients need a way of identifying which source are editable


Impact of the characteristics

When using reference records, the need to identify a specific member of the set (e.g. the checked out status) presents the significant possibility of implementation coupling between modules. This could undermine the replacement of interfaces within FOLIO.

If the set of reference records can be changed for a tenant (one of the powerful characteristics) then there needs to be an understand of how that effects other records or systems that refer to members of the set e.g. what happens if a status or type no longer exists?

Open Questions

  • Are there any other options that have not been considered above?
  • How do client modules identify a specific member of the set in order to attach meaning or semantics to it?
  • What should happen if no reference record can be found using the chosen identification?

Summary

There are significant trade offs to both approaches, this becomes especially challenging when the members of the set both need to be changed by a tenant over time and have specific behaviour associated with them.

Related work

The Technical Council and Sys Ops SIG have been discussing how reference records should work, e.g. should users be able to change them? and how upgrades should affect them e.g. should a change to the modules default definition of reference records overwrite changes made by a user?


4 Comments

  1. I personally feel that this isn't a one-size-fits-all type of thing.  There are scenarios where each of the approaches makes the most sense.  For instance, if you're modelling a finite state machine, e.g. order status, an enumeration makes things a lot easier to manage.  The system knows that an order can only be in one of these N predefined states, and it knows if/when/how an order transitions from one state to another.  If you make the set of states dynamic, you need to handle cases where a state that's currently in use is removed.  What happens to the orders that have the removed state?  The system needs to either detect and disallow this, allow the user to instruct the system on what to do, or essentially guess what to do.  Note that a variation of this situation can still happen with enumerations, if the enumeration changes, but at least it's known when the change will be happening (during upgrade), and developers can implement migration/upgrade scripts to take appropriate action.

    1. I personally feel that this isn't a one-size-fits-all type of thing.

      That's fair. My comments along the lines of "let's just always use reference records" are aimed at making an end run about thinking about this problem ever again, i.e. it's true that this may not be the simplest solution everywhere. But would we gain by picking a consistent solution everywhere because we wouldn't have to think about how to design something (when building it) nor how to use it (when consuming it). 

      Additionally, tossing away enums in favor of reference records makes the "related values" accessible via the API, which as a consumer of that API is beneficial to me. 

  2. It will also be easier to manage l10n for values if everything has a UUID and therefore can use a common API. If we have some enums and some reference records, we need to do more work to figure out how to extract a translation for a given value. Assuming such a translation module exists for server-side values, using reference records also makes it easier to be certain all values on a list have been translated. How can this be done for enums? 

  3. There are conversations happening around reference data and upgrades right now that start to veer into some of this topic's territory.  Specifically one idea discussed is to introduce a new class of data "system data" that's immutable.  This I think would help here in that it would give the stability of an enum in that these records will always be there, meaning we don't need  the enums anymore.  

    To be clear, no decisions have been made yet.  I just felt it was worth mentioning.