Skip to end of metadata
Go to start of metadata


Since OKAPI is the central message broker and all messages are directly
processed by or routed through it, I'd propose the following process:

  1. Every module should send its version number and all dependencies (with min version numbers) to OKAPI
  2. Every module should be able to process three messages/answers from Okapi for every interaction:
    1. DELAY (for when a requested module is not available within a configured TTL)
    2. MISSING (for when a dependency is not met or a TTL timeout is reached)
    3. RESUME (for when the module is available or the dependency is met again)
  3. Every module should be able to respond to those messages in a fallback mode, either by ignoring the local request responding with "Service not available" to the user or delaying their request until a RESUME or MISSING is received by Okapi.
  4. Okapi should be able to deal with two messages from modules
    1. DISCONNECT (for when the module gets shut down) This is a nice to have (modification: 2019-05-24)
    2. UPDATE (for when the module needs to restructure the database; Okapi should then answer all requests to that module with a DELAY while itself processing/routing all requests from that module)
    3. RESUME (for when normal operation mode is restored)
  5. Okapi should be able to deal with multiple versions of a module, sending requests only to the one with the highest version number (auto-dropout). Race conditions (older module writes/reads the database, newer module sends an UPDATE) should be resolved by Okapi by responding DELAY to the UPDATE request and all newer requests from other modules, granting some seconds grace time to the old module until it finished its run, and then progress with the UPDATE after sending MISSING to the old module.
  6. Okapi or a module should be able to log/message the admin in case of unresolved dependencies.

With those 6 requirements updating modules and Okapi itself should be able in two ways without or at least with minimal downtimes:

  • Shutting a module/Okapi down, updating the files and starting it up again.
  • Replacing the module with a higher version number on a different port/machine and decommissioning the other after a short grace time.


Thoughts on that?

  • No labels

3 Comments

  1. Jo, I think some of this is handled by Interfaces: A single tenant can't have multiple versions of the same interface enabled. So in a sense, Okapi can deal with multiple versions of a module - and use the module version that is associated with the interface it provides that is enabled for the tenant.

    -I think if multiple back-end modules provide the same interface version, Okapi will use the newest version of that module. I could be mistaken about this however.

    -Currently, there is nothing in-place for migrating or upgrading schemas from one module version to the next. This does need work...

    Some sort of "service interrupt" message I think would be useful. You could tell Okapi to flag a module as unavailable while it upgrades, and if the endpoint is hit for that module, you would get a message in the UI or in the message queue that it is unavailable.

  2. Maybe I'm getting you wrong, but a GUI is the last thing I have in mind regarding dependency management. Since this is a lowest level problem, it should be handled by the lowest level, and not a GUI. The latter should be able to display responses of said modules, but not manage those dependencies, since there will be a point in time where a dependency for the GUI is not available and the whole shebang then is crashing down, leaving you in an unpredictable meta-state between the updates. If you want to see that in action: Update a SLES 11 to 12 using the GUI. If you want to see how it's done right: Do the same with a Debian from 8 to 9. The latter handles dependencies on the lowest level and doesn't restart dependencies of the GUI until the GUI is restarted itself (by rebooting in this case). At no point in time you will have a failing GUI hosted terminal (and thus a failing update, since the file handler closes unexpectedly with the failing GUI terminal), since the old system will continue to run although the new packages have already replaced them on disk. Up until reboot, that is.

    And I'm not saying this is the way it works now, I'm just proposing an idea for how it could be managed. Current restrictions don't really apply here. 

    What I intentionally missed out is the "I'm about to update' flag, because the way I see it, Okapi should be able to deal with a module dropping out unexpectedly, reason unknown. This is also has a security background: If the module mod_auth is failing because some user was able to pipe bullshit to it, triggering a SEGFAULT, memory leak or buffer overflow and an immediate termination by the OS, the last thing Okapi should be doing is failing uncontrolled because there was no UPDATE message from the module. We have to think about bullet proof systems, and this is and will always be the responsibility of Okapi, as the service with the highest rights on everything. What I expect Okapi to be doing is waiting for a response until the TTL is over, and then answer MISSING to the querying module. Don't think about an additional nice way, if you already have to think about a brutal one.

    1. Jo, sorry when I mention interfaces I’m not talking about a GUI.

      See here: https://github.com/folio-org/okapi/blob/master/doc/guide.md#versioning-and-dependencies