2020-02-14 - System Operations and Management SIG Agenda and Notes

Date

Attendees

Goals

  • Learn about database upgrade testing and other things

Discussion items

TimeItemWhoNotes
5WelcomeIngolf
  • Find a note taker
  • Question about the lack of documentation for running FOLIO while not using containers.
  • The Docker files actually show how to build each module.  Not really.  They are all abstracted away making assumptions about Jenkins and others.  The actual files exist in Git for each module.  Jo has Jenkins running and trying to feed configurations into it.  Jason stated docker allows one to ignore all this. 
  • Core team responsible for this documentation.  Ingolf will discuss with Jakub.
  • Jo is worried about a Docker installation failing and the entire system goes down.
  • But: Docker will not be going out of business. Docker will probably even outlast Folio. Docker is a much lower risk than Okapi.

 Database upgrade testing

TAMU team: TAMU in-place upgrade, round #2

Round 1 during Q4 upgrade (in place migration) failed.  The code was not tested.  Jason figured out where and logged defects.

Round 2. Going from Q3 to Q4.  For licenses and agreements scripts work.  But other modules don't.

Jason will add tickets and John Melconian are getting back to the dev teams responsible for the upgrade scripts.  The scripts don't leave the system in a good and stable state.  Not recoverable.  No log of what went wrong.  No exception handling.  What completed and what didn't is unknown.

There is no way to roll back.  It fails badly.  Not acceptable.

Rollback is a requirement

One can clone and test before placing in production, but we still need a graceful exist.  Not sure a full rollback is reasonable at this time during the project.

Need to know what it's going to do, log what it has done, if it fails, where & why, need to re-run starting at that point.

We didn't know what had been done to the system. What steps have been completed and what steps have not been completed. We need some more verbous output. The system needs to tell us what is going on. Where did it fail and why.

Right now, no visibility at all.

We might be relying too much on OKAPI.  So we have no insight. The scripts interact with the tenant API through OKAPI. There is no documentation. The scripts are running automatically when you POST "enable" to Okapi.


System / database backup strategiesIngolfAll data must be in the shared Postgres outside the container.  In MARCcat, does the database live inside or outside the container?  It supposedly uses a database outside of the module.  Data import cache's data internally for streaming.  Might be an operational concern.  It uses RMB streaming.  Might be a problem for high availability because the cache is not shared.  Postgress 12 and java8 is not compatible for embedded postgres. The project will move to a newer version of java at the right point of time.

Outstanding charges of the SIGIngolfCreate an architectural diagram → topic for another session

Missing documentation of the container images on hub.docker.com/u/foliociIngolfsee above. Ingolf will talk to Jakub.

Old software versions in the installation documentation; causing potential security vulnerabilitiesIngolf / hbz

Topics for next meetings

  • database clustering

Architectural diagram. Wayne & Jason are working on the one for Kubernetes-Rancher installation. Do we need to create a more general diagram, for those who do not plan to use Kubernetes/Rancher ?

Action items

  •