Representation of Sets in back end APIs

FOLIO primarily has two ways to represent the choice of values from a set in back end APIs: enumerations and reference records (there are some exceptions, including unconstrained strings and hard coded sets).

Context

Historically, it seems that properties used primarily as information for people (or for dynamic policies) e.g. instance status or material type have been represented as reference records and states or types that the system needs to interpret semantically as enumerations e.g. item status, order type, order workflow status.

Recently, there has been increasing interest in representing the latter as reference records as well. This document is intended to outline the considerations for such an approach.

Challenges

How should FOLIO define a set of values which can be changed by a tenant over time and associate behaviour of the system to specific members of the set?

Should FOLIO follow a single convention for modelling sets of values?

Design

Enumerations

Enumerations are defined directly with the property that they constrain, for example, this is the definition for the possible item status names:

Item Status Names

{
  "name":{
    "description":"Name of the status e.g. Available, Checked out, In transit",
    "type":"string",
    "enum":[
      "Available",
      "Awaiting pickup",
      "Awaiting delivery",
      "Checked out",
      "In process",
      "In transit",
      "Missing",
      "On order",
      "Paged",
      "Declared lost",
      "Order closed",
      "Claimed returned",
      "Unknown",
      "Withdrawn",
      "Lost and paid",
      "Aged to lost"
    ]
  }
}

Reference Records

Use of reference records is made up of two parts:

the set of reference records themselves (with their own API)
a property in the record that refers to a member of the reference records by ID

Instance type is an example of this, the API for the reference records is defined here and the referring property in an instance is defined here. There is a corresponding database table and foreign key respectively.

Characteristics

The two approaches have very different characteristics, particular in where they are defined, how they are referred to and how they can change.

Enumerations

Are referred to by name
Are defined by an interface
Members are fixed for any given module version
Clients cannot dynamically get the members of set
Are only a name, cannot have other descriptive properties associated with them
Can only be referenced by a single property

Reference Records

Are referred to by name
Are defined by the implementation
Members may change independently of module version
Clients can dynamically get the members of the set
Can have additional descriptive properties (beyond the name) associated with them
Can be referenced by multiple properties

Comparison

	Defined by	Referred to by	Stability of values	Descriptive properties
Enumerations	interface	name	Static/fixed	No
Reference Records	implementation	id	dynamic/can change	Yes

Worked Example - Holdings Source

This feature is being worked on at present by the Core Functional team in preparation for the FoliJet team to disallow editing of holdings that are based upon imported MARC files.

Behaviour

The source is assigned by different processes
- For holdings created during data import, the source should be MARC
- For holdings created any other way, the source should be FOLIO
Can filter instances by whether there it has holdings with that source
The source dictates whether holdings can be edited
- Holdings with a MARC source cannot be edited directly
- Otherwise holdings can be edited directly
Tenants may want to define their own sources in the future

Comparison

Below is a comparison of how the characteristics affect this (and related behaviour) behaviour.

Situation	Enumeration	Reference Records
When importing a MARC file, how does data import know the correct source to use?	Has to be hardcoded based upon the dependent interface version	Needs to be identified during the process (and might not exist)
When creating a record, how does the reference UI know the correct source to use?	Has to be hardcoded based upon the dependent interface version	Needs to be identified during the process (and might not exist)
When presenting the search options, how does the reference UI know what source values can be used?	Has to be hardcoded based upon the dependent interface version	Can be fetched via the API entirely at runtime
How does the reference UI know whether an holdings record is editable	Sources that are editable have to be hardcoded based upon the dependent interface version	Clients need a way of identifying which source are editable
How can a tenant add a new source?	They cannot, sources can only be added for all implementations via the interface	Via the API (or possible a settings page in the reference UI)
If a new source is added, how does the system know if holdings with that source are editable?	As the list is fixed, is decided during development	Clients need a way of identifying which source are editable

Impact of the characteristics

When using reference records, the need to identify a specific member of the set (e.g. the checked out status) presents the significant possibility of implementation coupling between modules. This could undermine the replacement of interfaces within FOLIO.

If the set of reference records can be changed for a tenant (one of the powerful characteristics) then there needs to be an understand of how that effects other records or systems that refer to members of the set e.g. what happens if a status or type no longer exists?

Open Questions

Are there any other options that have not been considered above?
How do client modules identify a specific member of the set in order to attach meaning or semantics to it?
What should happen if no reference record can be found using the chosen identification?

Summary

There are significant trade offs to both approaches, this becomes especially challenging when the members of the set both need to be changed by a tenant over time and have specific behaviour associated with them.

Related work

The Technical Council and Sys Ops SIG have been discussing how reference records should work, e.g. should users be able to change them? and how upgrades should affect them e.g. should a change to the modules default definition of reference records overwrite changes made by a user?