Skip to end of metadata
Go to start of metadata

What is Codex?

Codex is a normalization and virtualization layer that allows Folio to integrate metadata about various resources regardless of format, encoding, or storage location. It is the piece that allows disparate resources to be surfaced using a common vocabulary and description.

Normalization. Codex removes differences in encoding and format to provide a single representation of all participating resources, regardless of how they are managed.  

Metadata: Codex implements a light-weight (simplified) metadata model to describe resources. This common denominator can be mapped to most existing metadata models, thus providing a common vocabulary.

Virtualization: Codex spans storage locations. It does not matter whether a resource is managed locally or in a remote system. Furthermore, some systems may not be directly responsible for the management of resources, but may be aware of them (e.g. Orders). These systems can also participate in presenting normalized metadata to Codex - essentially providing “pseudo resources” for Folio.

Layer: In a layered representation of Folio, Codex sits at the top. It can be the starting point for all inquiries on resources. From this layer, one may drill down further into the lower, richer layers for any given selected resource.

No really, what is Codex?

Domains

Folio is a platform that is based on a (specific) microservices architecture. Codex plays a key role in that architecture. A fully featured platform of microservices can easily consist of dozens of individual microservices.

However, some microservices are closely related to others, as shown below using common colors.

Those closely related microservices can be grouped together into what we call a domain. The modules which belong to the same domain will reflect the following characteristics:

  • a shared context

  • a shared vocabulary

  • awareness of each other

  • a shared data model


Codex is a Domain

Codex is a domain which consists of multiple modules (and apps). The most familiar Codex app is the Codex Search app. It provides the ability to search for resources across a number of other apps that may be used to manage resources: e.g. eholdings, inventory, institutional repositories, etc… A given app, such as the Codex Search app, typically consists of multiple modules.

But Codex Search is not the only app in this domain. The types of other apps which might be found in the Codex domain include: catalog exports; authority control; resource relationships; agreements apps. Unlike the Codex search app, these other apps may require data, in which case the relevant Codex domain modules would implement their own data storage.

Codex addresses the Entanglement Problem

One of the challenges of a microservices architecture is that of managing the large number of individual microservices that come together to create a coherent solution. For Folio, part of that problem is addressed by Okapi which provides a registration mechanism for modules and the service interfaces they implement. However, it only provides a flat model where all modules sit at the same level.

Folio microservices are implemented as individual modules. In keeping with the principles of microservices, those modules should not need to be aware of the inner workings of other modules - especially across domain boundaries. Each microservice should concern itself with the capabilities that it is responsible for delivering. It should be unconcerned by other similar modules in the system and whether those are present, active, available, etc… Each should just mind its own business.

But in practice, domains, and the microservices they contain, form a system and there are necessary interactions between the microservices, including across domain boundaries. Those interactions create dependencies, and in a flat model, this leads to entanglement - thus breaking the promise of a loosely coupled system.

The Entanglement Problem

The interaction between microservices (i.e. modules) is an integration problem. It requires that individual modules be aware of other modules. For modules in the same domain, this is expected and not an issue. However, the same is not true across domain boundaries.

One way to manage interactions between modules is through direct integration. In that approach a module retains explicit knowledge of specific interfaces provided by other modules. It must deal with operational concerns by providing proper error handling and fallback scenarios should those other module(s) not be available, or even present in the system. It must also ensure ongoing compatibility as other modules evolve and interfaces are added, changed or retired. That effort is then repeated for all modules with direct dependencies.

Unfortunately, this direct integration approach leads to a highly coupled system that can quickly become inflexible and fragile. In the particular case of resource management, there are multiple domains involved. All of them would need to interact with the others at some point to deliver various functionalities.

Imagine that a Folio system has 3 resource managing modules (e.g. one from each of the  inventory, KB, and acquisitions domains). Each will require 2 integration points, one for each of the other 2 domains. In total there are 3 integration links between the 3 domains. .


This may seem like a manageable approach that can scale linearly. But it is not. Now imagine that a Folio system has 5 resource managing modules. In all likelihood, in a flat modular system, each module will likely need to be aware of each of the other 4. As shown below, the number of interdependencies grows to 10.


In the general case, if a Folio system has N resource managing modules, the number of potential integration points isN(N-1)/2, which is not linear, but scales on the order of N2.

A different approach is to use a brokered integration. A new service can act as a coordinator between the different related modules. Each module in turn needs only to integrate with the brokering module. The problem of managing the state and availability of the different services can be achieved in one place. This allows the individual modules to “keep minding their own business” and remain blissfully unaware of other modules’ business. A brokering microservice does need to be aware of the other modules, because that is its business.

Essentially, the brokering module approach introduces a hierarchy. Through the use of a hierarchy,  a module only needs to maintain integration with the level above it. It no longer needs to manage dependencies with its peers for the same purposes. With the use of a hierarchical layer such as Codex, the set of dependencies remains linear upon scaling. Individual resource managing modules only need to concern themselves with integration to Codex.


Codex is a hierarchical domain that acts as a coordinator for resource management. As a coordination domain it is Codex’s business to know about the other domains.

A Hierarchical Domain Model

A significant responsibility of Folio is to manage resources, and more specifically the records that describe those resources. However, not all parts of Folio require the same level of detail for those resources. Nor do they have the same scope requirements: Circulation doesn’t much care about the purchase price of an item, but Acquisitions certainly does. The following diagram focuses on resource management specifically. As shown, the resource managing domains can be organized into layers based on the depth of detail that they require and the purpose they serve.



Starting from the bottom:

Formal Records. These are the most detailed records available in the Folio system. This is where MARC records are found in the Folio system. The MARCcat app is used to edit, validate and even create MARC records which live in Source Records. Eventually, Folio will provide similar support for Dublin Core, BibFrame and other non-MARC formats. Note that In addition to bibliographic records, formal records can include such things as Order artifacts (EDI) and License artifacts (e.g. scanned paper licenses). In support of all source records is a sophisticated Import system that manages the ingestion and lifecycle of these documents.

Working Records. These are the app-specific versions of records that are necessary for the apps to deliver their functionality. These records may be bound to related source records at the lower layer. For example, an Instance record in Inventory may be bound to a MARC record in Source Records. When bound to a formal record the working record does not need to duplicate all the fields contained in the formal records - it only needs those that are required to complete the functionality delivered by the apps in that domain. Note that working records may also contain additional fields, not found in the formal records, but which are transient or operational in nature.

Unifying Records. This top layer is the one in which Codex and its modules operate. The purpose of this layer is to provide a uniform and high-level view across all of Folio. In particular Codex presents a complete and consistent view of all resource related objects from anywhere in the Folio system. Codex is afforded visibility to all the resource managing domains that might exist within a Folio system. It is not to say that Codex contains all the metadata related to any resource within Folio. Instead, resource records surfaced through Codex are linked to working (functional) records in the various resource managing apps, In turn those may also be linked to formal records, thus forming a path from Codex to the finest degree of metadata granularity.

Codex is the Entry Point for Resource Management

Folio is a modular platform. Consequently, functionality may be added to the platform at runtime, as needed, in the form of modules. But it also means that a Folio installation may be created with a reduced number of components: streamlined to provide only necessary functionality.

In the case of managing resources, there are likely a number of different apps whose modules could play a role. There may be an inventory app (or two), and/or an eHolding app (or two) and/or an Institutional Repository (or two). It is always possible to directly go to any one of those apps and access the capabilities it provides. But that assumes a pre-established knowledge that the app in question is actually responsible for managing those specific resources. However, In the more general situation, it is better to start with a Codex domain app as the entry point. From there it is possible to navigate to any part of the system in order to manage resources in the appropriate context.

Codex provides the starting point for locating and managing a resource in any of of the constituent parts of Folio. Once a resource is located, it can be examined in full detail in the appropriate Folio app that is responsible for it (the source of truth). This is the case for the user experience of a person interacting with the Folio UI. But is it also true of any external system required to integrate with Folio for resources! Such a system should not be integrating directly at the individual resource managing module in Folio - since none of them will contain a complete representation of all resources. Nor should the external system need to integrate with multiple modules and then perform its own merge and reconciliation. Instead it would integrate to Codex which is the hierarchical entry point to all the resources located within a particular Folio installation. The external system does not need to know about the internal module configuration within a particular Folio installation. It can delegate that problem to Codex.

Codex is a normalizing Data Model

Since Codex is a domain, it provides its own conceptual data model for describing resources. That model is closely related to the BIBFRAME2 conceptual model and containing metadata fields very similar to those in the Dublin Core data model. The data model  supports multiple entities such as Items, Instances, Packages and more, as described here. It strives to be a simplified data model that is the common intersection of the more complex and specialized data models used by the individual resource managing apps. The model is complete enough to suit the task of describing any Folio resource, yet small enough to limit the amount of metadata duplication between it and the other domains.  

By implementing its own data model Codex provides a mechanism to normalize resource descriptions across all Folio modules. Rather than creating individual mappings between extant resource data models in each app, the Codex data model is a common intermediary, to which those extant data models can map themselves.

Codex manages Relationships between Resources

As part of coordinating resource metadata between microservices, Codex is also instrumental in establishing relationships between resources, in particular when those resources might be described (source of truth) in different microservices.

This establishing of relationships between metadata managed in different services is simply an application of the concepts of linked data. Establishing a connection between two such resource descriptions allows them to remain fully within their respective managing microservices. It provides a solution to the dreaded practice of duplicating and copying metadata between system components.

When a relationship is established between two or more resources, that relationship needs to be persisted somewhere. The sensible place to store the relationship and its links to resources, is in the Codex domain, in the app used to manage relationships.

For example, to describe a particular Work it may be necessary to establish a relationship between an ebook managed in eHoldings and a print book managed in Inventory. The work is itself an entity and therefore it needs to be persisted in a microservice that will be responsible for it (source of truth). One approach might be to attempt to duplicate the ebook metadata in Inventory or alternatively duplicate the print book metadata in eHoldings. However, either of those creates direct dependencies and entanglements between the two microservices. Furthermore, it would introduce a problem of data synchronization and might also create an ambiguous source of truth. A better approach is to use another microservice, which is neither Inventory nor eHoldings, to persist and manage the Work. That microservice would be Codex.

Codex is Resource Central.

Authorities and controlled vocabulary lists are a natural fit for Codex. Implementing these in the Codex domain makes them available to all Domains equally - they can be linked to from Inventory, eHoldings or MARCcat. These could not only be usable in the narrow context of  resource management, but also potentially in other areas of Folio..

Codex includes Codex Search

The most familiar use of Codex is to provide a unified search function for resources: Codex Search. Codex supports a generic resource metadata model based-on Dublin Core. In order to participate, resource managing components (i.e. microservices) will implement Codex Search conforming APIs. The integration requires a metadata mapping - a data crosswalk between their native metadata resource representation and the Codex metadata model’s representation. By implementing, then registering these API interfaces with Okapi, the resource managing components become contributors to the Codex Search functionality.

The Codex Search functionality can thus provide a single entry point to locate any resource throughout the entire Folio system, regardless of the specific module responsible for managing it. The use of a simplified but generic metadata model is in keeping with the Codex goal of understanding resources regardless of their local format or encoding.

It is expected that specific resource managing modules may provide their own more advanced search capabilities. This is possible since at the level of those modules there exists richer and more specialized metadata not available from Codex Search metadata model. One might start with Codex Search to locate a desired resource and its managing microservice, then drill down from there to more advanced searching.

Codex Searching can also be used as an embedded component to allow other Folio apps to provide a user experience for locating resources, in any part of Folio, to be linked to from another microservice or module. For example, an Orders app may include the ability to conduct an embedded Codex Search to locate and link a resource in a Knowledgebase for selection and inclusion in an Order.

A fully implemented Codex Search will allow searching for all forms of resources: instances; packages; providers. It will also return the holdings and status of returned results within the context of the Folio installation.

Use Cases

The following selective use cases are illustrative of the Codex concepts discussed above.

Use Case 1: Locating Resources

Folio is a highly modular system, where separate apps might be responsible for managing different resources. But modular also means that specific resource managing apps may or may not be available in a particular Folio system. In keeping with the microservices approach, the individual resource managing apps should also mind their own business and be as little aware of each other as possible. This is where Codex comes in: the business of microservices in the Codex domain is to know which resource managing apps are present and how to interact with any of them. (Codex operates at the unifying layer.)

How to tell if a particular resource is available to an institution, either available for circulation or for access or even just on order?



  • In this example there are 3 domains shown, each of which might represent any number of resource managing apps within.

  • None of the individual resource managing apps has a complete picture of all resources available in the system.

  • Codex is the starting point for locating a resource. Each of the resource managing apps “registers” itself to Codex so that resources may be located.

  • The resource managing apps inform Codex of the matching resource for which it is responsible..

  • Once the resource is located, the user can drill down into the responsible app to retrieve full details about that resource.

  • The Order system is a resource managing app within the acquisitions domain.


That last point deserves further explanation: the Order system is a resource managing app. This states that there are resources that naturally exist as part of the acquisitions domain. When an order is constructed it will describe through purchase order line items, the “things”,(i.e. resources) that are being ordered.  These are resources in their own right and they exist in the Acquisitions domain. Furthermore, it is the Order system’s responsibility to manage those resources until they are received. Therefore, the Orders app can report those resources to Codex when asked to do so. After a “thing” has been received, it is no longer the responsibility of the Order system. A new resource will be described by Inventory or the KB or whichever part of the system is now responsible for managing it and reporting it to Codex. The Order system can now simply stop reporting that resource to Codex because it is no longer its responsibility.

This is a much simpler and more elegant solution than more traditional approaches whereby  pending orders might create temporary records in Inventory. In this case, the Order app does not need any awareness of the Inventory system. It does not need to know how to create a stub record there. It does not introduce direct dependencies between those two systems.

Use Case 2: Exporting comprehensive Folio catalogs to an OPAC or Discovery system

It turns out that the problem of exporting a comprehensive catalog of all resource holdings to an OPAC is very similar to that of a Codex Search. The primary difference is that the actor performing the task is another system rather than an end user. An Edge API takes the place of the UI in making calls to Codex.

How do I synchronize my catalog with my OPAC or discovery system?


  • In this example there are 3 apps shown, but there could be more, such as additional KB apps, or Institutional Repository app(s) or even another Inventory app.

  • The SourceRecord Manager holds the MARC (or DC) records describing the resources in the Catalog.

  • None of the individual resource managing apps have a complete picture of all resources available in the system.

  • Codex is the starting point for gathering resources to be exported to the OPAC. Each of the resource managing apps “registers” itself to Codex so that its managed resources may be aggregated into the OAI-PMH response.

  • OAI-PMH requires resource descriptions in MARC (or Dublin Core) format. These will be found in SourceRecord storage. The resource managing apps will contain links to the appropriate bibliographic records in Source Record Storage (SRS).

  • An Edge API is used to deliver OAI-PMH responses to the external system. It allows the remote system to integrate to Folio using existing conventions (OAI-PHM).

  • The OPAC can be made aware of pending orders since the Order system can surface resources (pseudo-resources) that are “on order” - just as in Use Case 1 above.

  • There are no direct dependencies between the various functional modules (inventory, eHoldings, Orders) since all connections go through Codex.

Use Case 3: Creating Associations between Resources

Codex is not just a search interface. There exists the need to create relationships between resources. Perhaps an ebook managed through eHoldings needs to be related to a print book managed in Inventory because both represent the same work. In keeping with the concepts of Linked Data we would want to create a Work object and link it to the different instance records that represent manifestations of that work.  

Where do I store the relationship between two different records?


It would be simple enough to create a structure to represent and store such a Work in Inventory or likewise in eHoldings. But the problem arises when the two instances to be related, exist in two different parts of Folio: in this example, in Inventory and in eHoldings. Do we create the work structure in Inventory? Or do we do it in eHoldings? Or do we do it both? The answer is neither. We create the Work object in the Codex domain and link it to the resources that are managed respectively in Inventory and in eHoldings. This avoids a direct entanglement between Inventory and eHoldings. In this case, both can remain blissfully unaware of each other. Furthermore, since we are linking and not duplicating resource records between domains, we also avoid the problem of data synchronization.

Codex can connect resources from disparate parts of Folio and create arbitrary relationships between them (not just Works). For example, it can be used to create and manage categorizations. Similarly, the Codex domain can be used to manage and make available Authorities.

An important consequence here, is that In order to support relationships and authorities, the Codex domain requires its own storage capabilities for persistence.

Use Case 4: Inter-Folio Functionality

In the not too distant future, there will exist multiple Folio installations. For a number of reasons these will not want to live in isolation, there will be a need for collaboration between them. Motivations include: participating in a consortium; sharing holdings as part of a union catalog; direct interaction with another institution; a shared cataloging effort. From these a number of related use case scenarios can be identified.

  • An institution wants to make its resources available to other libraries

  • A Consortium wants to make its resources available to its members

  • An institution wants to participate in a “community zone”

  • A Union Catalog needs to be created


Each of these, and more, may be treated as separate use cases. However, there is a commonality between them all, which is that they concerns themselves with Folio as a whole.

It is only a small leap from a Codex domain pulling together resources within a single Folio installation, to a Codex domain pulling together resources that are located in distinct Folio installations.

Each Folio installation has its own Codex domain consisting of relevant Codex apps. From this perspective, Codex is not only the starting point from which to start resource management within each Folio. It is also the point where each Folio can pull in Codex compatible components from other Folio installations.


Multiple scenarios are Illustrated above:

  • In red, a Consortium shares resources with its members

    • The consortium makes its Inventory available to the Codex in each of the member institutions. Institution-1’s and Institution-2’’s Codexes (Codices) each pull-in the Inventory from the Consortium Folio. They each effectively have two Inventories to work with: one local and one shared.

    • It makes its Acquisitions selectively available to the member institutions. In this case Institution-1 has access to both a local Acquisition or the Consortium’s Acquisition, through its own Codex.

    • Integration can be selective. As shown here, Consortium Acquisitions may not have been made available to Institution-2 or that institution may have chosen not to  integrate it to its Codex.

    • Similar forms of sharing can be envisioned for other Codex compatible contributors such as Institutional Repositories.

  • In blue, a Consortium subscribes to a commercial KB, which it makes available to a member institution which does not subscribe on its own.

    • Institution-2 has integrated the Consortium’s KB to its Codex.

    • Institution-1 already has its own KB subscription on its Codex, so it does not need to integrate the Consortium’s KB.

  • In purple, a union catalog is created that pulls together the inventories from Institution-1 and Institution-2.

    • The union catalog itself could be instantiated dynamically through a dedicated, minimal Folio installation: no Acquisitions; no Circulation; etc…

    • The union catalog is powered by what would likely be a dedicated app in the Codex domain.

    • The union catalog Folio installation would use its Codex to pull-in resources from the remote Folio installations.

38 Comments

  1. I'm always a little uncomfortable discussing domains as part of the architecture. Domains are a handy way to organize the conversations of SMEs and POs, but they don’t really have any physical reality in FOLIO; they are not an architectural construct or mechanism. FOLIO has a bunch of microservices, and it has a number of apps that operate against those microservices to do things that people find useful. Any given app may use several different microservices, potentially spanning "domains".

    One of the things that’s been clear for a very long time is that libraries need to have a way to make statements about bibliographic instances (instances in a BIBFRAME sense, roughly analogous to something of which you can purchase or hold a copy or an electronic entitlement). Like, do I own copies of it. Do I have it on order. Do I have electronic entitlements. Does anyone ever actually USE it. Am I covering the subject areas that my patrons need me to cover. It is desirable to establish bibliographic identity as much as possible, and to associate different kinds of facts and information with bibliographic instances.

    You can potentially virtualize some of this through mechanisms like the codex search, but if you want to do it well, if you want a consistent mechanism to make statements about what you have now, how that’s changed since last week, etc., you are much better off if you have a single place where you represent that stuff on a permanent basis, and where different apps can collaborate to ensure that bibliographic identity is managed.

    The instance storage in FOLIO is a candidate for such a place. It uses a data model which has evolved through two years of thoughtful SME effort to comprise a fairly compact 40-ish data elements that are deemed important to describe instances across a number of different functional needs. We can hang any number of more detailed descriptive metadata formats behind it, by linking to records or storing them locally, and we can associate things to it from different functional areas, like acquisitions, resource access, physical inventory management, (maybe) electronic resource management.

    At its heart, the instance storage gives us a potential place to manage bibliographic identity across different functional areas. This enables all kinds of good things to happen downstream.

    I would want such a thing to exist in FOLIO. I would want it to use a lean but sufficiently rich metadata model. I would ideally want only one such thing to exist in FOLIO. If someone wants to build a better one than the current instance storage, that would be a good discussion to have.. but I’d like to understand what it would do better than the current one. I don’t see FOLIO enhanced by having two competing ones.

    If the current instance storage is broken or doesn’t perform well, or lacks features, I’d fix them. If it’s so fundamentally broken that it CAN’T play the role of the point of management for bibliographic instance identity management, I’d like to understand why or how.

    My point is this: FOLIO needs a single place where apps can collaborate to ensure the best possible unified bibliographic picture across different apps that deal with bibliographic data. Right now the instance store is it. If it's broken, let's fix it, but let's also discuss when it's appropriate to do so given all of the urgent priorities that we're juggling in the project.

  2. Riffing off of Sebastian's comments, I'd also like to point out that the Inventory data model as it stands now is identical to what was originally proposed as the Codex data model. The Inventory Instance is already a normalization. I am leery of the idea of normalizing normalized data. The Instance normalization meets needs identified in numerous use cases.

    While I'm not particularly concerned with *where* authority data live (e.g., in Inventory, in Codex, or in some other domain), I am concerned that we acknowledge the inherent connectivity between authority data and bibliographic data. I don't see it being effective or useful to separate the idea of "work" from the work/expression (I like to call it wexpression) data currently being recorded in MARC authorities.

    I see a lot of potential for Codex not only for searching but for inter-app communication. I think of it being able to play a role I call "dispatcher" (based on emergency response models) -- all apps can speak directly to Codex and Codex can listen to/speak to multiple apps at the same time. That is Codex can, in effect, relay and broadcast messages. I am also very interested in the idea of Codex as a starting point for discovery.

    Please note these are my general comments as an individual and not representative of the MM SIG.

    1. > I'd also like to point out that the Inventory data model as it stands now is identical to what was originally proposed as the Codex data model

      The Inventory data model is not identical to what was proposed for Codex. It was based-on the Codex model but then many more fields and structures were added - as would be expected. This can easily be seen by comparing the JSON schemas for instances between mod-inventory and mod-codex-inventory.  Even at the conceptual level they are not identical; the inventory model has added objects such as Holdings.

      1. And this is one of my biggest gripes with the data model currently used by the Codex search app. It's a uselessly small set of fields, and it is not aligned with the inventory instance model which is fairly well established as representing what we want to say about bibliographic instances, with almost everything being optional so it can be dumbed down to the level of an electronic KB which is about the lowest denominator when it comes to bibliographic metadata. I don't believe that FOLIO benefits in any way from having two different models in play. The two models (inventory and codex) should never have been split, and as far as I'm concerned no credible reason has been articulated for why that happened, other than, two different people worked on them. I think we need one unified way to talk about instances, and later about their relationships with works, contributors, etc.

        If we can clear that up, then I think there's a basis for a much more constructive conversation about a common framework for bibliographic metadata.

        1. ... a uselessly small set of fields ... dumbed down to the level of an electronic KB which is about the lowest denominator ...

          The implication being that an electronic KB is useless?

          There are already more than 2 resource data models in Folio: MARCcat inventory; KB; orders; codex. Each has their own specialized version that meets their specific needs and only their specific needs. There will be more in the future: e.g. institutional repositories and things we haven't even thought of. Is the intent to expand the Inventory data model every time something new needs to be accounted for? That would be highly problematic. It would also be counter to the notion of microservices that Folio has embraced whereby microservices are responsible for defining these things within their own context (what I'm calling domains here).

          The very notion of Codex is to avoid the problem of the every growing universal data model. A data model that would become an imposition on any of the microservices attempting to implement it, with large and growing number of elements irrelevant to any one of the domains. This is the justification that I have presented in the past where Codex seeks the intersection of metadata description rather than its union. It is no coincidence that the Codex data model closely resembles that of Dublin Core.

          1. The implication being that an electronic KB is useless?

            Nope, not even one little bit. (smile)   The implication being that historically, electronic KBs tend to present fairly limited bibliographic metadata (but exquisitely precisely honed holdings information). 

            The very notion of Codex is to avoid the problem of the every growing universal data model. A data model that would become an imposition on any of the microservices attempting to implement it, with large and growing number of elements irrelevant to any one of the domains. This is the justification that I have presented in the past where Codex seeks the intersection of metadata description rather than its union. It is no coincidence that the Codex data model closely resembles that of Dublin Core.

            Practical experience with the inventory schema has shown that it in fact isn't ever-growing, but that there's a compact shared format that is meaningful for describing bibliographic resources, and which will grow quite slowly under careful curation. The problem with the DC, conversely, is that it tends to be insufficient for any real application, which is why most applications extend it.

            Meanwhile, optional elements in the inventory schema should not be a problem for any application, whether they are a source or a consumer of metadata. But an incomplete model is. In FOLIO, the inventory instance schema, I would argue, is the intersection of metadata description. The current codex model is not precisely because it has never been subjected to and validated by any real use cases.

            1. Practical experience with the inventory schema has shown that it in fact isn't ever-growing, but that there's a compact shared format that is meaningful for describing bibliographic resources, and which will grow quite slowly under careful curation. The problem with the DC, conversely, is that it tends to be insufficient for any real application, which is why most applications extend it.

              Perhaps, but the only thing that we've attempted to use the Inventory schema for is traditional bibliographic records found in an ILS.  That it works well for the one well-understood bibliographic format is not a sound argument for saying it will work for others formats that look and behave nothing like MARC.

              (In other words, one could say that the current Inventory model has never been subjected to non-MARC real use cases.)

              1. (In other words, one could say that the current Inventory model has never been subjected to non-MARC real use cases.)

                I think this is a most valid point

                1. Peter Murray and Theodor Tolstoy, we very much hope to make BIBFRAME to be the next use case in Inventory, and to do work with real bibliographic meta data which originates from a non MARC bibliographic data format (smile)

                  1. That would be great, Charlotte Whitt, but what i meant was data that is non-MARC and non-BIFRAME as well for that matter. Something that is a bit outside of the normal bibliographical descriptions but are still  valid things/items to circulate and manage within a library inventory.

                    1. Hi Lisa Sjögren, should we maybe during this weeks bug-fest try to test that use case Theodor mentions above and 

                      1) do a description of a non bibliographical 'thing'  

                      2) test check-out/check-in of this thing

  3. +1 to Laura's comment about authorities. 

    I am concerned that we acknowledge the inherent connectivity between authority data and bibliographic data. I don't see it being effective or useful to separate the idea of "work" from the work/expression (I like to call it wexpression) data currently being recorded in MARC authorities.

    I think this is spot on. I think the BIBFRAME conversations have taught us to think of works, instances, agents, etc., as just different but very highly related entities that make up a descriptive universe. They have the same needs for curation, sharing, provenance management, etc. 


    1. Inherent connectivity, yes. I'm wondering a bit about whether there's a division of use cases. My thinking here is still a bit murky, this is all in-progress thinking.

      This vision document highlights three problem domains: (1) the need to search through all of our "stuff" and ask questions like "what do we have?"; (2) the need to export information about stuff that we have for reuse in other systems, including but not limited to discovery systems; and (3) the need to manage entities like works, instances, agents, etc. 

      I think (2) has been discussed sufficiently elsewhere in these comments.

      The combination of (1) and (3) lead me to wonder whether inventory can be deconstructed, so that searching of the inventory is somehow separated from managing holdings and items. At least those seem like distinct operations. That is, while holdings and items need to be managed and know what instance they are associated with, just how intertwined must they be? Is it enough that holdings and items know what they are associated with and the searching somehow be handled separately?

      Thinking about (3), are there reasons to put all of the entities together, or would it make sense to keep some logical separations (even if storage is shared)? The record-based formats are one thing, like MARC or MODS. I assume they will need separate storage per format. Thinking of linked data formats/ontologies, would it make sense to mix everything together or might it make sense to keep bibliographic data separate from, say authority data? I really do mean that as an open question. Is there something different about about what we need to do with bib data vs. authority data, or are the really the same at the operational level?

      For the moment I find it helpful to thinking more specifically about authorities. I think it non-controversial that we would want FOLIO to manage authorities in a linked data way. That would include allowing record-based formats like MARC to reference such authorities and somehow make use of the most recent data. It also means being able to use standardized authorities, which maybe defined in different ontologies (LC Names, VIAF, MeSH, Getty vocabularies) alongside locally-defined authorities (which may be defined using some of the same ontologies) and to keep straight the source of truth for any of these entities. At the authorities level, I could imagine wanting having a local cache of LC Names or Getty ULAN names or what have you, to periodically flush or refresh that cache, or to periodically harvest updates and prune deleted statement, and absolutely not remove the local names at the same time. So managing the use of national standards alongside use of local names. (And eventually maybe become a publisher of the local names.) And of course I'm speaking more generally than just names.

      Assuming those are all correct, do they also apply to bibliographic entities? I see a lot of similarities. Are the operations on the bibliographic entities that are different? Reasons that we might want to think of them as somehow being a different category?

      Apologies if this is murky, I'm still sorting issues in my head.

      1. Tod, really good questions.

        I agree that the present inventory app can and should be deconstructed. If you look at the underlying microservices, I think a lot of that decomposition has already taken place. I would love to see the management of bibliographic identity separated from the management of holdings and items. The UX has evolved a little bit through experimentation, being one of the very first apps designed. I think today, it might be approached differently. It might be fun to see someone sketch out a deconstructed app based on the same underlying microservices.

        I'm really curious to see an exploration of authorities. I feel like MARC thinking groups all types of authorities into one group of auxiliary "databases". This suggests giving them a separate UX. But in a BIBFRAME ecosystem, would everything potentially share the trait of being just another dataset maintained elsewhere? In other words, are "instances" qualitatively different than works and subjects, say?

        This is where a UX-anchored exploration of the way that we view and engage with BIBFRAME entities could be really useful in my view. When trying to think about this myself, I keep finding myself stuck in past models and the superficial similarities between BIBFRAME and record-oriented models.

        1. I've long thought that when trying to move past MARC that libraries would have been better served by first exploring linked data in the realm of authorities, rather than starting with descriptive metadata, because there are already rich relationships and syndetic structure encoded. So it's a natural fit, a good place to figure out how this could work. I think the amount of free-text transcription and relatively flat structure in the legacy bibliographic practices make it much more difficult to shift mental models.

          Modeling authorities first might be a way to get started, and then see how that model does and does not fit bibliographic data.

      2. The combination of (1) and (3) lead me to wonder whether inventory can be deconstructed, so that searching of the inventory is somehow separated from managing holdings and items. At least those seem like distinct operations. That is, while holdings and items need to be managed and know what instance they are associated with, just how intertwined must they be? Is it enough that holdings and items know what they are associated with and the searching somehow be handled separately?

        This is a good observation.  A lot has been bundled into what has become the Inventory app.

        1. I think this is where many are seeing that inventory can be a catch all for searching, discovery, and managing instances/holdings/items. This is very similar to Alma without the instances of course especially in light of the new Primo VE which has a tighter integration with Alma - this means that discovery configurations often overlap or get mixed up with metadata configurations and mappings. Would it make more sense to think of managing inventory in terms of linked data that was brought up above? The association is then a reference or set of references. With searching, these references wouldn't be lost but could be coupled in unique and perhaps more powerful ways - I'm thinking of sparql endpoints that people like to play with. This could have interesting applications for reporting/analytics where you create queries that follow and combine the linked connections. In terms of discovery, there are resources that might not be appropriate as inventory such as bringing in metadata from remote repositories. It seems that inventory might not be the place to handle such resources.


  4. I was leaning towards the Codex and then after today's metadata management meeting am leaning towards inventory with no codex. However, I'm still asking questions about use cases and Folio working from the ground up. One place where Inventory and Codex might not be identical is discovery. And some of my thoughts that follow might be my just being new to Folio and not fully understanding Inventory. The use case that I'm thinking of is the one mentioned above in the document:

    Use Case 2: Exporting comprehensive Folio catalogs to an OPAC or Discovery system

    It turns out that the problem of exporting a comprehensive catalog of all resource holdings to an OPAC is very similar to that of a Codex Search. The primary difference is that the actor performing the task is another system rather than an end user. An Edge API takes the place of the UI in making calls to Codex.

    How do I synchronize my catalog with my OPAC or discovery system?


    I'm not sure where this happens in Inventory. Where do we configure the file with the needed frequency, delivery options etc to a discovery service or OPAC layer? With this question, I'm not making a case for the Codex. Perhaps this could be a simple add on in settings on exporting to discovery systems. I just thought I would add this to the discussion.

    1. Just to clarify, I don't think anyone is suggesting we get rid of Codex – there already is and needs to be the Codex search app at the very least.

      1. I was also thinking how each app would be able to handle communicating with other apps. Wouldn't that code have to be written in to each app? What if communication with a future app can't be predicted? Would it be easier to just write one communication standard which is the codex? In many ways, I'd like to hear what the developers and sys op people think about this. Could it be that the codex turns more into a back end search and communication mechanism and we as users work in inventory?

        1. I was also thinking how each app would be able to handle communicating with other apps. Wouldn't that code have to be written in to each app? What if communication with a future app can't be predicted? Would it be easier to just write one communication standard which is the codex? In many ways, I'd like to hear what the developers and sys op people think about this. Could it be that the codex turns more into a back end search and communication mechanism and we as users work in inventory?

          The way apps communicate with other apps in FOLIO is through APIs on the back-end. These are defined by developers and form a really important part of the back-end architecture. Any developer can introduce a new API for a new function, but there are also APIs which are 'core', in the sense that many other apps may depend on it, and so they should only be changed carefully, and ideally in a backwards-compatible way.

          The way inventory works is, it exposes an API that lets anyone that knows about bibliographic instances add information about them to the instance storage module. It doesn't matter if it originates from a purchase order receipt or from a batch load or the MARC editor or some other thing, and it doesn't matter what format it originated in. The apps that know about instances can map those instances to the shared instance data model and push them into the shared instance storage module. By doing this carefully, we can try to ensure that anything having to do with "the lord of the rings" is represented with the same ID in FOLIO, associated with lightweight instance-level metadata, but with a way to get to the richer metadata (eg MARC) if you need it, and with a way to tie together the different activities that coincide around a bibliographic instance. So the instance store becomes a kind of super-index that spans all the apps that may maintain data that involves or intersects with bibliographic data.

          The current codex search app works very differently. Every app that supports codex searching exposes a standard API. There's a "multiplexer" that is part of the codex search app that takes a query and broadcasts it to all of the available codex endpoints. Then it gathers the results into one window. The benefit is that the data can be maintained anywhere, there is not a central store. The drawback is that the search may feel a little slow, and it's a lot harder to establish identity, to merge things on the fly. It's also impossible to synchronize an external discovery layer's index from the current codex, because it has no knowledge of when things change in the systems that it accesses. It only knows how to forward queries. 

          I hope this makes sense!

        2. I was also thinking how each app would be able to handle communicating with other apps. Wouldn't that code have to be written in to each app? What if communication with a future app can't be predicted? Would it be easier to just write one communication standard which is the codex?

          I think that captures well the point in this document about entanglement. If every app needs to know about every other app, then that creates dependencies both at the development level and at the operational level. There is no expectation that there is a shared data model between apps, with microservices each microservice  (or ideally the set of apps that form a domain) is able to define it's own data model perfectly optimized for its needs.

          1. If I'm understanding correctly, in this role Codex would be working behind the scenes, relaying information? So that, for instance, Inventory and Orders would both share information with Codex, and Codex might share information gathered from Orders with Inventory – rather than Inventory and Orders sharing information directly with each other?

    2. Hi Jennifer. If the system works like inventory, and everyone contributes information about bibliographic instances to it, then there are a couple of strategies we can use to drive a discovery layer. I'm assuming here that the discovery layer has its own Index and that it'll probably want to do its own "massaging" of metadata to suit its needs.

      1) Push: Whenever someone adds or changes an instance in inventory (or whatever thing works like inventory, whatever it is called), we can actively PUSH the change to the discovery layer. That way, the change will become visible to patrons using the discovery layer almost immediately. 

      2) Pull: This is probably the more common approach today. The discovery layer can periodically poll the inventory (or whatever) for changes, and receive all of the instances that have been added or changed. In this case, the scheduling will probably be managed within the discovery layer's configuration.

      There are variations of 2) where FOLIO would periodically push a file of changes to an external location like an FTP site... functionally, that is similar to 2 in that it's a scheduled event and changes will be synchronized with the discovery layer in batches. It's a little more brittle than 2 because you have FOLIO pushing changes periodically, and then a second periodical event where the discovery layer picks up the changes.

      The current codex search application, which is basically a broadcast search, does not really help us drive an external discovery layer, because it has no knowledge of the actual contents of the systems it is accessing – it is just forwarding searches and displaying results.

      We may have to support all of these options to meet the needs of different discovery layers. I personally like 1) and feel like that's where we should aim in the future, but we have to allow for the fact that many/most discovery layers may not be designed to receive updates on the fly.

      1. This is super helpful Sebastian both for this and above. After reading that I'm still left with the question of Why Codex? Some part of me thinks this is a good idea perhaps because it's better to have it now and then see what we can do with it later. Another part is still questioning what we can do with it.

      2. If the system works like a Codex OAI-PMH app then there is no need for "everyone to contribute bibliographic instances" (i.e. copy them). Everyone just goes about their business and provides them to the OAI-PMH service when requested. The Codex layer will do the work for you. It will retrieve the linked MARC records from SRS and take care of packaging them up for OAI-PMH.

        The same optional push/pull strategies would be possible from Codex.

        My preference is for one of the other variations of 2) where the Folio system and Discovery system can be most decoupled and thus less vulnerable to either one of them being offline or too busy momentarily. This is the variation where Folio only sends out a change notification message when the change occurs. The message is safely captured in a messaging system. It can then be acted upon by the discovery system in its own time. Since only a small notification is sent, the risk of data loss is minimal and is thus more robust. It's like receiving a delivery notification in your mailbox when you're not home and going to the post office to pick up the parcel when you get home. Now imagine that the post office is open 24x7 - that's better then letting them try to deliver again tomorrow when you already know you won't be home.

        1. This is the variation where Folio only sends out a change notification message when the change occurs. The message is safely captured in a messaging system. It can then be acted upon by the discovery system in its own time.

          Just to be clear, the changes notifications would be on the order of "record X was updated", "record Y was deleted," "record Z is new," maybe with a time stamp. Maybe not exactly, but roughly that level of minimal, right? We still need an endpoint for the Discovery system to ask for those records, probably in batches, and it sure looks like the batch export that needs to be built anyhow. Meanwhile the bookkeeping and storage for the message queue itself is very simple and lightweight, and could even have multiple subscribers. And if any party falls over they can pick up where they left off.

  5. It sounds like tomorrow's sprint demo will include a discussion of integration between purchase orders and Inventory. It would probably be useful to watch for thoughts and inspiration.

  6. As promised, here is a link to the MM SIG's google doc version with inline comments.

    https://docs.google.com/document/d/1NQamK4fSi0WRXonIgBBbpCTauac8BMdcAkY5mGXlIL0/edit#

    And two questions from me, as MM convener: 

    What parts of this have impacts on development and need to be addressed immediately?

    How can we best move this discussion forward?

  7. On 'The Entanglement Problem' - Seems like the use of HTTP, URLs and JSON as a substrate solves this problem from a syntactic perspective. From a semantic perspective, isn't it better that modules are aware of resources as schemas and services presented at URLs? Piping all traffic through a single endpoint seems to be in conflict with the microservices architecture we aspire to.

    1. Codex plays a role in the integration of microservices on the system. The microservices of Codex act as brokers to the other microservices. They add value in the same way to that found in real estate broker services, or flight broker services. Is it more effective to book a flight by visiting the website of every possible airline? Or is it better to use an aggregating website that offers you choices from all the airlines? You don't have to worry about whether each airline is still in business, or doesn't fly a particular route during winter months, etc... Once you have located the flight you want, you are of course free to visit the airline's site directly in order to get more details or book directly with them. The same is true with Codex: it gives you a single point of entry to locate any resource, from which you can go directly to Inventory or the KB or Orders or the IR.

      Is it really better that modules need be aware of all the different resources schemas and services (each of them unique) that may or may not be available in a particular system?

  8. Something we may want to consider from an system architecture standpoint:

    If we like the idea of a loosely coupled system, and a microservices platform certainly can be that, we may not want to move the entanglement problem to the Codex.  Moving the module interactions to Codex has some advantages, but you still have the problem of maintaining those interface relationships, albeit less of them. One of the biggest challenges in microservices architecture is to avoid a ton of simultaneous, synchronous API calls.  This can cause your system to perform very poorly.

    Might we consider using a data and event streaming platform (ie Apache Kafka) to handle those interactions instead? It solves the entanglement problem while reducing the number of API calls for a given action.  Some of those interactions (between permissions and users for example) are necessary, others can be done asynchronously.  For Codex to be performant, it would seem a good option.  

  9. "Codex is a normalization and virtualization layer that allows Folio to integrate metadata about various resources regardless of format, encoding, or storage location."

    Point of clarification: the Codex is only virtual if it is recreated on the fly at each use, i.e., the Codex doesn't exist in any semi-permanent state. If it is indeed an aggregate view that reflects some sort of normalized repository of "records," then it is not virtual. Hence the question, does anything actually exist "in" the Codex?

    1. Right now the Codex is virtual (a federated search across Codex sources), but that was not the long-term intention of the Codex.  Pretty soon the Codex is going to need to be some sort of permanently stored data so there are identifiers that can be used to do things like cluster identical items from disparate Codex sources, link instances across Codex sources to work records, and so forth.  In that way, the Instance portion of what we now see as the Inventory app starts to look a lot like what we want Codex to be.

    2. It should be noted that even if there is permanent storage of Codex records, those records are not the source of truth – they augment rather than replace the source records and are always derivatives of the records from the Codex sources.  That there is permanent storage allows for the establishment of persistent identifiers and certain indexing and clustering optimizations.

  10. This is a write-up I put together a few months ago that might be helpful in this conversation.

    It makes the case that the FOLIO UI needs a unified search input front and center. Instead of being in a standalone "Codex Search" app, searching across diverse types of records becomes the primary way to navigate around FOLIO. That type of power is a huge part of the reason for something like the Codex layer to exist.

    This is the level of simplicity FOLIO's users experience every day. When you're logged into Facebook, you don't have to toggle between friends, events, and groups to search. There's a "federated" search input that searches across the entire graph of relational data. Patron-facing UIs are also constructed around the idea of a single, powerful search input.

    Current problems

    - As the list of FOLIO apps grows, users have to process which app will have the data they need.
    - Search filters are very long and unwieldy in some existing apps.
    - Clicking on a related record (for example, clicking on a checked-out title on a user's record) jarringly switches the context to a different app.

    Approach

    Introduce a search input to FOLIO's header that takes precedence over links to apps.

    The new search input would search across every data type that exposes an interface to it - users, inventory, agreements, requests, vendors, etc.

    The input can feature a typeahead to help users get to the record they want even faster.

    The search filters visible after a search would be faceted - only filters relevant to those results appear, with counts next to them indicating the number of matching search results.

    Clicking on a related record can maintain the search without having to kick out to another "app" context.

    Deprecate the Codex Search app.

    Other UI considerations

    - This reduces the number of actions needed to get to a specific record.
    - The MultiColumnList (table) design does not work for a heterogeneous result set. Search result listings likely will need to be closer in design to Google search results or Amazon product listings.
    - An infinitely scrolling list may not be the appropriate approach for a heterogeneous set of search results.
    - Search filter construction can be centralized, but still very flexible.
    - Modules/apps take far less precedence than the detail records themselves.
    - The central reason for apps to exist would be for workflows that are more complicated than just viewing/editing a single record (like check-in/check-out, bulk cataloging, etc).
    - The front-end architecture would need a way to expose detail record UIs of different types to the central search UI.


    This is a dramatic shift that moves FOLIO's UI away from the sandboxing paradigm of mobile phone OS apps. As we're discovering, FOLIO is not a set of data sandboxes. It's a collection of views and tools over a highly relational set of data.

    1. Jeffrey — these are great comments about the UI, but I fear they might get lost here in what is mostly an architecture discussion.  I suggest posting your text as well to discuss.folio.org in the UI/UX category so it will be seen by a wider audience.