Entities App: Vision

Last Updated: 2020-09-22 by members of the Entity Management Working Group

Overview

The Entities Management Application (Entities App) is the module of the FOLIO architecture where entity data is managed, created, cached and published. This application provides the functionality to support entity management, integrate both external and internal linked data sources across FOLIO applications and provide one component of the requirements for facilitating entity-based data models, such as BIBFRAME. This will be FOLIO’s central place for managing controlled entities and supporting any metadata schema that FOLIO stores and supports. 

In order to facilitate other functions, the Entities App must cache external entities and store internal entities. External entities, such as Library of Congress (LC) Subject Headings, the LC Name Authority File (LCNAF), Getty Vocabularies, RBMS Vocabularies, or any web-based thesauri that have linked data endpoints, may be cached internally so that data can be controlled within MARC SRS, Inventory and other data stores to be developed within FOLIO.

The Entities App will also support locally defined entities which will be created, maintained, and published with persistent URIs. Examples of local management of entities may include an institutions' local entities that are not represented by an existing external entity data source, or the creation of local data about an external entity (e.g.: adding alternative labels for problematic subject headings).

The Entities Management WG acknowledges The CODEX Vision and that there is some overlap between that vision and the concept of Entity Management. However, the Entity Management App is envisaged to implement specific functions to enable the management of entities within Folio and does not restrict itself to the Codex Vision nor require any part of the Codex Vision to be delivered.

Within the scope of authority management, the Entities App engages with data stored elsewhere in FOLIO and third-party lookup services in a number of ways. First, the Entities App holds the data-of-record for controlled headings used within FOLIO. This includes data such as headings stored in 1XX, 6XX, 7XX, etc. fields in MARC SRS bibliographic records as well as reference data currently managed in settings within FOLIO (e.g.: resource type and carrier type).

Entity Types

  • Agents: Persons, Organizations and other entities that contribute to or are otherwise related to a resource. Examples of source data include LCNAF, Getty Union List of Artist Names, etc.
  • Genres: Categorical groupings that represent the stylistic components or form of the resource. Examples of source data include RBMS Vocabularies, Getty Art & Architecture Thesaurus, etc.
  • Geographic: Physical or administrative regions that may be associated with the publication or subject of a resource. Example of source data include GeoNames.
  • Subjects: Conceptual entities that denote an "aboutness" for resources managed in FOLIO. Example source data for subjects include LCSH, FAST, etc.
  • Works: A bibliographic description set that can be used to group together related Instances. Example data sources include LC BIBFRAME Works, LC Hubs (clustered works), OCLC Works, Share-VDE Works, SHARE-VDE Opus (clustered works), etc. By using set theoretic modeling, different Work entities can have different attributes of Work, including which properties are associated with their Work concepts. Set theory would also allow flexible modeling where Instance attributes can be shared with those of Work attributes, as needed. Works may be hand-crafted or aggregated and clustered.
  • Other entities as appropriate/needed to support library activities, including, for example, MARC and RDA vocabularies such as the MARC Code List for Relators <https://www.loc.gov/marc/relators/relaterm.html>

Note: data may derive from multiple sources and may not align with FOLIO modeling; as such, we anticipate multiple source data -to- FOLIO mappings for each of the entity types.


User Stories

Local Data: As a cataloger, I want to create and manage a local entity when no external data source includes a relevant entity so that I may either create an entity that can be reused in other records or datasets OR can meet the requirements of an entified data model (e.g.: BIBFRAME) that expects an entity for certain data types, such as agents, subjects, etc..

  • Sub Use Case: The ability to publish the local entity as linked data.

Support Discovery: As a developer building a user discovery environment, I need the ability to cache/integrate external data in order to index and create displays across multiple datasets. (e.g. display author information in bibliographic data displays, or search on author and bibliographic data together).

Cross App Data Integration: As a FOLIO user, I need the ability to view cached/integrated external data in order to identify entities and discover resources related to entities across multiple datasets across different FOLIO applications. (e.g. display author information in bibliographic data displays, search on author and bibliographic data together, leverage work entity data in the Inventory App to know what versions of a work are in our collection).

Change Management: As a technical services staff member, I need the ability to update local descriptions based on changes to external data. (e.g. update MARC values if a preferred label for an entity changes) so that the display value in FOLIO apps and the external discovery environment reflect current data in comparison to the external dataset and other institutions using those data. This can happen via automated means for straightforward updates or a manual process for complex changes (e.g.: authority splits, etc.).

Local Values: As a library staff member, I need the ability to override racist or otherwise problematic external vocabulary terms with inclusive labels so that the data in the FOLIO implementation and external discovery environment reflect the values of the institution. 

3rd Party Lookups: As a cataloger, I need the ability to search via a third party lookup service/API to connect FOLIO data with external vocabularies/entity descriptions so that I may easily reuse external data when creating data in the FOLIO environment. This work may entail original cataloging in MARCcat, Inventory or future non-MARC metadata editors; further, this work may entail enhancement of existing FOLIO data.

Strings to Things: As a technical services staff member, I need the ability to identify values not linked to a local or external entity in order to perform authority control so that all references to a particular entity use the same display value and are linked to the same URI; this has direct benefit to searching, filtering and cross-linking in both FOLIO as well as external discovery environments.

Work Instance Clusters: As both a public services or technical services staff member, I need the ability to leverage linkages between different entities to perform actions within FOLIO (e.g. find all Instances of a Work so that I can submit a request for any Instances of the selected Work. or can identify all Works on a particular subject.)

Collection Intelligence: As a collection development librarian, I would like to use entity data to query our collections so that I can replace lost or missing items versions of a work, analyze collections for print retention strategies, know what resources we have from local authors, etc.

Extensible Modeling: As a library staff member managing the FOLIO environment, I need the ability to define and extend the model used to describe a particular entity type, which may involve terms from a variety of ontologies (e.g. use BIBFRAME and related/emerging extensions for Works) so that we can richly describe entities, and map and import external data from multiple datasets related to the concept (e.g.: LC Hubs, SHARE-VDE Opus and OCLC Works can each be used as source data for the Works entity type, ideally abstracted into a local model).


Conceptual Diagram & Architectural Considerations

The diagram above represents a high-level view of how the Entities backend application might fit into the FOLIO architecture.

  • The Entities app provides storage and an extensible data model for entity data (Agents, Subjects, Works, etc.) that are referred to by multiple records.
  • The Entities data model is an abstraction layer, primarily designed to provide access to controlled strings, not to full descriptive metadata or LOD representations of the entities.
  • Existing applications such as Inventory and MARCCat include Entities data by reference, so that controlled strings such as preferred labels, headings, or vocabulary terms are looked up when needed (and cached), but the source of authority for those strings resides elsewhere.
  • The Entities app, like the Inventory app, provides for storage, update, and retrieval of the Entities abstracted data model, as well as access to the full descriptive metadata record or LOD graph via Source Record Storage.
  • Existing mechanisms such as Data Import and Data Export are potential applications for entity exchange and update; data exchange may also engage directly with APIs and/or mandate new tooling.


Why a Separate App?

In keeping with FOLIO’s current practice of each application focusing on a large management task or work area, Entity Management will require its own application to manage its specialty. This will support traditional library practices (e.g. authority control) while allowing for extensibility and future needs (e.g. leveraging linked open data). A separate Entities App is appropriate for reasons of scope, scale, and architecture. Because entities are shared resources across other FOLIO apps, and because the Entities App will support a number of external vocabularies for any FOLIO-supported metadata schema, the level of complexity for aggregating entities and managing those entities requires that it exist as its own application in FOLIO. 

The Entities App Vision document authors are aware of effort by ScanBit to build authority control functions in the MARCcat App. While authority control for MARC data is one component of the vision outlined as part of entity management, that effort does not address the suite of functions in the entity management user stories. MARC authority control is a short-term stop-gap for select functionalities; further, this effort only benefits data interacting with MARCcat.


Functional Requirements

The FOLIO Entities App will need to support CRUD functionality for entities, as well as the definition of entity types. Further, the Entities App will support CRUD functionality for maintaining relationships between FOLIO stored or cached entities and FOLIO records as well as external entities.

Note: the below should be reviewed and expanded by the Resource Management Special Interest Group (RM-SIG).

Create

  • Define new entities
  • Define new entity types and associated properties
  • Define external data sources that should be cached / stored locally. 
  • Define external data sources for semi-manual update (ingest records into local data store on a regular basis.)
  • Define entity sub-types, including the set of properties associated with that sub-type (e.g. work clusters/super-works).
  • Define relationships between entities (via properties within entities or the definition of an entity type that mediates relationships between entities). 
  • Define relationships between FOLIO records and entities. 
  • Create new local entities of each type. 
  • Create new entities from external or local data sources.

Read/Consume

  • FOLIO search
  • Entity look up at point of FOLIO record creation or editing. 
    • Inventory records
    • MARC records and other bibliographic standards, as available in FOLIO
    • Entity records
    • Order records
    • Eholdings records
    • Requests
    • Others?
  • Publishing local entities for use by external sources.

Update

  • Edit entity records. 
  • Edit local properties in entities that are managed by an external or local data source. 
  • Update entity records in batch from external data sources.
  • Edit relationships between entities.
  • Edit relationships between FOLIO records and entities.
  • Edit definitions for entity types.
  • Edit data sources for entity types.

Delete

  • Delete an entity record.
  • Delete a profile for an entity type.
  • Delete a data source for an entity profile.
  • Delete an entity record based on deletion from external or local data source.
  • Delete a relationship between a FOLIO entity and external or local data source.
  • Delete a relationship between entities.
  • Delete a relationship between a FOLIO record and an entity.


Interactions Across FOLIO Applications

The below list includes interactions between the Entities App and other applications within the FOLIO environment, including applications that have been discussed but not yet developed. Further, the Entities App should interact with external discovery environments; while discovery is outside the scope of FOLIO development itself, this is identified as an interaction to highlight the need that these data be exposed to external discovery environments. 

Inventory

  • Inventory should be able to display the entity types managed by the Entities App and include relationships between Inventory records and those entities.
  • Inventory should be able to mint relationships between Inventory-stored records (e.g.: Instances without an underlying record in SRS) and entities in the Entities App

MARC Editors (QuickMARC and/or MARCcat)

  • When entities are recorded in new bibliographic description in MARC, the tools should use an entity look-up service to automatically link the heading to the correct form of the Name/Label registered in the Entities App; this functionality will also be relevant for other bibliographic standards not yet built-into FOLIO, e.g.: BIBFRAME, EAD, MODS, etc.
  • For entities recorded in bibliographic descriptions outside FOLIO (e.g.: in OCLC Connexion) and imported into Inventory, MARCCat and/or MARC SRS, FOLIO should report entities that do not match those in the Entities App; this report should be available for authority management functions within the Entities App. 

Source Record Storage (SRS)

  • Authority record loads should automatically update matched records in the Entities App
  • Authority record loads should automatically tag new entity records as such for easy review as part of authority management functions
  • Authority record deletes files from third-party authority vendor services should tag/flag matched records in the Entities App to
    • Provide collection intelligence data/reports where records set for deletion are associated with “live” bibliographic records which may need to be updated by catalogers. 

Library Data Platform (LDP)

  • Reports based on preferred Name/Label updates (new/change/delete) in the Entities App.
  • Reports use Entities App data to allow for collections data analysis
  • Error log for resource records not updated as expected
  • Reports for headings that are inconsistent with entities in the Entities App
  • Reports/dashboards may include data for non-matches, multiple matches, and unrecognized term, etc. 

BIBFRAME App (one-day)

  • When entities are recorded in new bibliographic description in BIBFRAME, the tools should use an entity look-up service to automatically link the heading to the entity and incorporate the pref label into the BF data
  • For entities recorded in bibliographic descriptions outside FOLIO and imported into Inventory, BFCat, and/or BIBFRAME SRS, reports of entities that do not match those in the Entities App should be available for authority processing within the Entities App. Reports/dashboards may include data for non-matches, multiple matches, and unrecognized terms, etc.

EAD App (one-day)

  • When entities are recorded in new bibliographic description in EAD, the tools should use an entity look-up service to automatically link the heading to the correct form of the Name/Label
  • For entities recorded in bibliographic descriptions outside FOLIO and imported into Inventory, EADCat, and/or EAD SRS reports of entities that do not match those in the Entities App should be available for authority processing within the Entities App. 
    • Reports/dashboards may include data for non-matches, multiple matches, and unrecognized terms, etc.

Discovery Layers

  • APIs should allow for the Entities held in either the Entities App SRS or the LDP to be harvested for: 
    • Display of preferred Name/Label in discovery environments and other external tooling
    • Knowledge card population
    • Building discovery experiences, such as browsing by entities types or navigating between library resources by entities.  Enabling the semantic interaction among related entities by library patrons using the discovery layer

External Lookup Services -- APIs directly from specific data source, and/or third party external lookup service

  • The Entities App should interact with specific data source APIs to refresh the Entities Managements SRS.  This will happen using scheduled, automatic, scripts based on institution-specific timelines or via manual data import processes for data sources that do not support more automated processes. Updates to specific data sources stored in the Entities Management SRS should also be possible on an “as-needed” basis via the Entities App using APIs directly from specific data source, and/or third party external lookup services.

Glossary

  • Note: as readers ask for other terms to be defined, we will expand this section.

Cache: A copy of external data stored locally. Reasons one might cache external data  include the need to build local indexes in support of performant search and browse requirements, to plan for outages in external data sources, protect against new versions external data not ready to consume locally.