Status

DONE

Stakeholders
OutcomeDR-000002 - Tenant Id and Module Name Restrictions
Due date
Owner

NOTICE

This decision has been migrated to the Technical Council's Decision Log as part of a consolidation effort.  See:   DR-000002 - Tenant Id and Module Name Restrictions


Truncation problem

PostgreSQL silently truncates identifiers after 63 bytes.

The PostgreSQL schema name is <tenant id>_<module name>. Truncation may result in the same schema name, for example for mod-inventory and mod-inventory-storage.

Name clash problem

The PostgreSQL schema name is <tenant id>_<module name> with minus/hyphen converted to underscore.

Tenant id foo and module name bar-baz result in schema name foo_bar_baz.

Tenant id foo-bar and module name baz result in the same schema name foo_bar_baz resulting in a name clash.

Module name uniqueness problem

A PostgreSQL schema name is case insensitive, and the module converts minus/hyphen to underscore because minus/hyphen is not allowed in a PostgreSQL schema name.

However, Okapi's module name is case sensitive and may contain lower and upper case letters. And it allows both minus/hyphen and underscore.

The module names mod-foo and Mod_Foo result in the same PostgreSQL schema name when created for the same tenant.

Reserved key word problem

PostgreSQL has a few reserved key words with underscore:

  • CURRENT_CATALOG
  • CURRENT_DATE
  • CURRENT_ROLE
  • CURRENT_TIME
  • CURRENT_TIMESTAMP
  • CURRENT_USER
  • SESSION_USER

Trying to use them as a schema name results in a syntax error.

Example: If a module has name user then enabling it for tenant id current or session will fail.

Kubernetes and AWS ECS label problem

Kubernetes label names must follow the DNS label standard: 

  • contain at most 63 characters
  • contain only lowercase alphanumeric characters or '-'
  • start with an alphabetic character
  • end with an alphanumeric character

Labels like mod-inventory-storage-23-0-2 are commonly used and may exceed the maximum length if a long module name is combined with a long version number.

AWS ECS has these restrictions:

  • Service Name: Up to 255 letters (uppercase and lowercase), numbers, hyphens, and underscores are allowed.
  • TargetGroup Name: A maximum of 32 alphanumeric characters including hyphens are allowed, but the name must not begin or end with a hyphen.

The existing module name mod-data-import-converter-storage already exceeds this 32 character limit. Sysops have assigned a special TargetGroup Name for this module, such a workaround should be avoided.

Other restrictions

The problems stated above need to be solved.

More restrictions exist that are already enforced:

  • Tenant id must not begin with a digit because PostgreSQL schema names must not begin with a digit. This is already enforced by PostgreSQL.
  • Okapi restricts tenant id letters to be lower case and a-z. Accented letters and unicode characters are not allowed. (Okapi's regexp)
  • Okapi doesn't allow a module name to contain a minus/hyphen followed by a digit because this starts the version suffix (Okapi's ModuleId parsing).

Solution

Proposed solution:

Limit tenant id to 31 bytes. Disallow underscore in tenant id. Regexp: [a-z][a-z0-9]{0,30}

Restrictions for back-end modules:

  • Module name can contain only lowercase letters, digits and minus. Start with a letter. Disallow minus followed by a digit or a minus.
  • Limit module name to 31 bytes. Disallow uppercase letters. Disallow underscore.
  • Regexp: [a-z]([a-z0-9]|-(?=[a-z])){0,30}
  • Disallow these module names: catalog, date, role, time, timestamp, user

Migration

Some FOLIO installations use tenant ids with underscore that is no longer allowed.

One back-end module fails the new module name restrictions:

  • mod-data-import-converter-storage is 33 bytes long exceeding the length limit of 31 bytes.

Tenant name migration

To reduce the downtime in multi-tenant installations it must be possible to migrate one tenant at a time.

Okapi and the modules should provide APIs and/or scripts to do the migration.

mod-data-import-converter-storage rename

A shorter name for this repository and module name might be mod-data-import-conv-storage with only 28 bytes will be mod-di-converter-storage with only 24 bytes (MODDICONV-259).

Follow guide https://dev.folio.org/guides/rename-module/

When the renamed module executes the tenant upgrade it checks whether the schema name with the old module name still exists. If yes it is renamed, the PostgreSQL ROLE is renamed and a new ROLE password is assigned if needed.

References

Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

  • No labels

13 Comments

  1. I have hit the character limit on labels in Rancher/K8s already with mod-data-import-converter-storage and the length of my namespaces. Glad this is being addressed.

  2. My thoughts/comments from Slack...

    Re: the “mod-data-import-converter-storage rename” - will this change correspond to some Folio flower release?

    At Tamu Libraries we don’t currently use underscores in our tenant IDs, and if the module’s repo is renamed, and the module name in the okapi-install.json/install.json is also renamed, should be good?

    My K8s deployment script generates the module’s discovery entries for Okapi using the okapi-install.json file by the name of the module, what version it is, and what port is listed in the deployment descriptor section of the module descriptor - and just adds the dashes where appropriate.
    I.E. mod-data-import-converter-storage-1.11.4 becomes http://mod-data-import-converter-storage-1-11-4:port

    Since our tenant id is short, and our namespaces short, I’ve not hit the character limit yet in K8s. But I almost did in my testing with this hugely log offensive module name. I imagine this is probably of great concern to hosting providers running multiple tenants in K8s.

    mod-data-import-conv-stge  would be even better. Why not rename all modules with storage in their name to stge ?

  3. Please do not encourage to generally "rename a repository", especially just by following GitHub instructions. There are many ramifications when a module repository must be renamed if the module is in active use. The document guidelines for Naming conventions encourages people to consider very carefully. That document links to another document for the various steps, when one must be renamed (including retaining the old repository).

  4. I gather that it is possible for the module name to be different to the repository name. However normally they are the same. Being different could also have some other effects. So the full renaming procedure would be better.

  5. I don't really agree with David here. Renaming a module is one big task; renaming a repository is another. Doing either one in no way necessitates doing the other, and we all need to careful not to make unnecessary additional work for ourselves and (equally important!) each other.

    1. FWIW, I think David Crossley (and my) concern has to do with CI tooling which depend on particular conventions being followed.

      1. I would like to know what those conventions are. To me it seems wrong-headed to impose constraints like this, and I would like to see them loosened.

        1. This is obviously a side issue that is worth discussing, perhaps just not here. Guidelines are actually fairly well documented in several documents maintained by David Crossley, e.g.:

          Asking for constraints to be loosened is reasonable, recognizing that generalizing tooling that was developed with specific assumptions can be a larger effort than it may seem to someone not involved in maintaining the tools.

  6. I have some concerns about attempting to provide a tenant ID migration facility in Okapi, for a few reasons.

    1. Okapi storage is completely separate from module storage. I believe many operators provide storage for Okapi in a separate database (partially addressed by the proposed skipSchemaRename option (query parameter?)
    2. Okapi doesn't currently (AFAIK) have any notion of whether a module provides storage or the details of how that storage is provisioned. For example, RMB-based modules happen to create both a schema and a role using the tenant_mod_whatever naming convention, but other modules (mod-agreements, "Springway"-based modules) create only a schema and not a role.
    3. Okapi has no notion of whether or if the tenant ID is stored or used in any way in module data or cached in module memory.

    I don't think Okapi is well-positioned to successfully execute this kind of migration. It may be able to orchestrate a migration if the modules provide a system interface (e.g. the _tenant interface) to manage their own migration (recognizing that this approach is considerably more work and harder to maintain).

  7. Howdy all,

    Apologies I was not trying to "encourage" anything, just asking what the process might look like from a timeline perspective, and if it's going to be done - to give some more headroom for bumping up against character limits.

    Perhaps a better option might be some sort of validator tool then? If we do not want to be very prescriptive to the operator (but still have standards, as Wayne Schneider provided the links to) and have "loose" restrictions, we could ask (Okapi?) "Hey I want to validate my naming conventions for A,B,C (tenant id/module name/schema name) against X,Y,Z (K8s labels/reserved names/schema requirements).

    To hit on some of the points Wayne mentions re: Okapi tenant migration tool...
    At Tamu for Dev and Test Folio envs we do use entirely separate (containerized) Postgres DB instances for Okapi. For Pre and Prod we use the same instance of (VM) Postgres, but a different DB within that. I agree that Okapi currently has no way of knowing anything about what is stored in my module database... Seems like a large ask.

    One last way this could be mitigated, albeit through what lies between the chair and keyboard (we all know how reliable that tool is (wink))- is to have a giant red blurb in Okapi's readme documentation regarding tenant id/module/schema naming conventions that says "DON'T DO THIS!" Then make sure that is available/pointed out to operators and developers of Folio.

    1. asking what the process might look like from a timeline perspective, and if it's going to be done - to give some more headroom for bumping up against character limits.

      Does this mean that you (or other system operators) have tenants in production (or production like environments) today with tenant IDs longer than 31 characters?

  8. Marc Johnson , that specific comment was more about the naming conventions for modules in an environment enabled for a tenant, and less about the tenant id itself. Currently, my tenant ids are all short, with no special characters.

    1. Jason Root Thank you for clarifying that for me