Since OKAPI is the central message broker and all messages are directly
processed by or routed through it, I'd propose the following process:

  1. Every module should send its version number and all dependencies (with min version numbers) to OKAPI
  2. Every module should be able to process three messages/answers from Okapi for every interaction:
    1. DELAY (for when a requested module is not available within a configured TTL)
    2. MISSING (for when a dependency is not met or a TTL timeout is reached)
    3. RESUME (for when the module is available or the dependency is met again)
  3. Every module should be able to respond to those messages in a fallback mode, either by ignoring the local request responding with "Service not available" to the user or delaying their request until a RESUME or MISSING is received by Okapi.
  4. Okapi should be able to deal with two messages from modules
    1. DISCONNECT (for when the module gets shut down) This is a nice to have (modification: 2019-05-24)
    2. UPDATE (for when the module needs to restructure the database; Okapi should then answer all requests to that module with a DELAY while itself processing/routing all requests from that module)
    3. RESUME (for when normal operation mode is restored)
  5. Okapi should be able to deal with multiple versions of a module, sending requests only to the one with the highest version number (auto-dropout). Race conditions (older module writes/reads the database, newer module sends an UPDATE) should be resolved by Okapi by responding DELAY to the UPDATE request and all newer requests from other modules, granting some seconds grace time to the old module until it finished its run, and then progress with the UPDATE after sending MISSING to the old module.
  6. Okapi or a module should be able to log/message the admin in case of unresolved dependencies.

With those 6 requirements updating modules and Okapi itself should be able in two ways without or at least with minimal downtimes:

  • Shutting a module/Okapi down, updating the files and starting it up again.
  • Replacing the module with a higher version number on a different port/machine and decommissioning the other after a short grace time.

Thoughts on that?