Async install

DRAFT - This is a work in progress

Overview

Module initialization/upgrade may take a long time and thus, make the install/upgrade service take a long time - Multiple hours in some cases.  This document captures thoughts related to an investigation of whether to make the install/upgrade asynchronous.  Also incorporated in this is the notion of being able to "continue on failure", and get a complete list of failures across all modules being installed/upgraded.

See  OKAPI-804 - Getting issue details... STATUS and  OKAPI-845 - Getting issue details... STATUS

Scope

The purpose of the async install is to  address the problem of long tenant init operations. While Okapi's HTTP client has no timeout for that operation, there are clients using Okapi and/or gateways between Okapi and a backend module which causes trouble and might give up on such long operations. The async install will only deal with the former problem. The latter problem will have to make each module work in an async fashion or use some other means of transport.

Persistence

To address this, and also to be able to monitor the progress of an install which may be calling tenant for multiple modules, it would be good to "persist" the operation.

Backwards Compatibility

The install and the upgrade (/_/proxy/tenants/<tenant>/install and /_/proxy/tenants/<tenant>/upgrade) .. can be kept as they are with no changes to existing behavior. They are currently using POST to perform the operation. We'd like to do use the same method to initiate the async operation. It could be as simple as using a flag "async=true" as query parameter. The RAML definition will be "identical" for non-async and sync mode.. But of course what follows is "async" only.

High Level Approach

  • If async=true, install/upgrade will return a Location and the status of the install can be inspected with GET on the returned location. It could be extended with DELETE to signal "abort" and remove info about install/upgrade operation.
  • When install is working, it will persist. It should also persist after it's done - as far as calling module's tenant init. It could persist as long as Okapi is running (for the cluster).. and be removed when Okapi/cluster is removed.
  • To list all operations, since the cluster was started... it would be possible to use GET on the install/upgrade path... same as POST to initiate. Likewise DELETE on that path would remove info for all operations.
  • It does not seem necessary to persist this to a database.. But it could be done.. And so list all install/upgrade operations...
  • Split tenant init into multiple phases
    • preInit - changes isolated to this module:  schema creation/updates and data migration scripts run during this phase.
    • commit - commit the changes made in preInit
    • postInit - intended for more business logic changes, loading reference/sample data, etc.  Here calling other modules is allowed.  if this is being invoked, all dependencies should already be satisfied.
  • Expand the /_/tenant API to accommodate the new mulit-phased interface
    • POST /_/tenant/preInst
    • POST /_/tenant/commit
    • POST /_/tenant/postInst
    • POST /_/tenant/abort

Multi-phase Tenant Initialization

preInit

OKAPI invokes preInit for each module being installed/upgraded in parallel.  Since preInit only involves changes isolated to the module, there's no need to account for dependencies here.  This allows us to run these in parallel.  The ability to rollback changes applied here must be supported, e.g. via DB transactions, copy-on-write, etc.

commit

Details of how this works are TBD and depend on the remediation approach taken in preInit.

abort

Instead of committing the preInit changes, here we're rolling them back.

postInit

OKAPI invokes postInit for each module being installed/upgraded in a bottom-up fashion.  So if module A depends on module B, A.postInit will be in "Pending" state until module B.postInit is "Done".  If module B fails, both module A and B will have status "Failed" along with some message/context.

Schemas

install_progress

What OKAPI returns when retrieving the status/progress of an asynchronous install/upgrade.

TBD - Could be an extension of the TenantModuleDescriptor.json format with an addition of a status.  Another option is to encapsulate this in an object that also has an overallStatus field.

Status Enumeration

  • Pending
  • PreInit
  • Committed
  • Aborted
  • PostInit
  • Done
  • Failed

APIs

InterfaceMethodPathRequestResponseDescriptionNotes
okapiPOST/_/proxy/tenant/<id>/install?async=true

Start an asynchronous installbehavior stays the same if async=true is not provided
okapiGET/_/proxy/tenant/<id>/install

Get the status of an install
okapiDELETE/_/proxy/tenant/<id>/install

Abort an install
okapiPOST/_/proxy/tenant/<id>/upgradel?async=true

Start an asynchronous install
okapiGET/_/proxy/tenant/<id>/upgrade

Get the status of an install
okapiDELETE/_/proxy/tenant/<id>/upgrade

Abort an install
tenantPOST/_/tenant/pre



tenantPOST/_/tenant/commit



tenantPOST/_/tenant/abort



tenantPOST/_/tenant/post