2019-03-22 - System Operations and Management SIG Agenda and Notes

Date

Attendees

Goals

  • Deployment issues

Discussion items

TimeItemWhoNotes
5WelcomeIngolf
  • Find a note taker: Ingolf
  • New members:
    • Ulf Seltmann. Ulf works at University Libraries of Leipzig. He works on deploying FOLIO.
    • Jackie Gottlieb at Duke. Jackie is also in the Data Migration SubSig.
 

Changes to make Folio work better with Kubernetes, or some other orchestration tool

Dependency resolution

Database schema upgrades

Rollbacks, Recovery

Anton, Jason and Adam will get back to us.

They have formed a working group to work on the JIRA Issues (features, stories) for the deployment issues. (Deployment issues have been discussed in the SysOps meetings from Feb 08 through Mar 08).

Btw. 2019-03-20 Julian Ladisch created RMB-347 - Getting issue details... STATUS in project RAML Module Builder following a debate on https://folio-project.slack.com/messages/C9BBWRCNB.

Maybe we need our own project or an epic in UXPROD or at least assign our label (sysops_mgt) to identify our issues.




Meeting Notes

jroot posted pioneering work about workload issues and response times to the group Slack, see the discussion yesterday. The worst offenders for resources are mod-agreements and mod-licenses. Jasons says these modules are built using the Spring boot system. They can interact with Kubernetes, unlike the other modules.

Comparison of Grails and Spring Boot : https://objectcomputing.com/news/2017/06/28/grails-vs-spring-boot

But also mod-permissions is memory "hungry", Jason says. And the database takes up a lot of compute time. Inventory also takes up a lot of workload.

And issue in using pgpool2 for load balancing and replication has been found by Christopher Creswell this week. Two issues were created: RMB-347 - Getting issue details... STATUS , RMB-348 - Getting issue details... STATUS .

Jason has posted Grafana screen shots (Kubernetes cluster monitoring). pgset-0 takes 2.37 nodes (of CPU time ?), by far the most. The backend modules don't take so much.

There are also problems related to the speed of the UI.

Harry: We should start to file defects.

Jason has loaded 137 K of user records and that took 8 hours on the whole, using mod_user_import. The UI for Patrons Code has not been optimized, it was last time modified 6 month ago.

Jason used a postgres set with 1 master and 1 or 2 replicas.

This is our Epic in JIRA: UXPROD-748 - Getting issue details... STATUS . There are 4 issues in it with no substructure yet.

Ingolf points out that there are more issues that have been raised by the SysOps SIG and also created in JIRA; those with label sysops_mgt, here: " UXPROD-950 - Getting issue details... STATUS " + "?jql=labels %3D sysops_mgt" (concatenate the two strings in a browser, JIRA does strange things with it).

These issues should also be subsumized under UXPROD-748.


Anton presents a slide deck: This is Anton's presentation: https://docs.google.com/presentation/d/1lT6O7oaEj0Yr9Q9qPcF9Pk2gL75jIqVrA61ifenRehM

There are 30+ GUI modules, 60+ backend modules. The latter ones are published as Docker containers.

Anton says there is much more to it than Docker and Kubernetes. Anton says one wouldn't want to do all the deployment work by hand. Anton and Jason favor working on supporting Rancher 2.0 as container orchestration option. Rancher 2.0 includes CI/CD integration, Monitoring and Logging.

Anton says Rancher could sit on top of everyone's infrastructure. Rancher is very flexible and adaptable to any environment. Anton proposes that in FOLIO, we will contribute to the toolset. Let's break down the work into parts, then come back and share them.

Jason argues that the common denominator is Docker.

Ulf: We need to bring the orchestration tool(s) to the mind of the developers.

Jason: Many orchestration toolsets rely on Kubernetes using Docker. Docker Swarm and Mesos (and others) probably have the same issues as we encounter in Rancher now.

Ingolf says that we must definitely meet again in the small group (Anton, Jason, ...) to work out the JIRA issues.  Ulf volunteers for the small group, Ingolf also. Ingolf asks for other volunteers.

Here's a list to the JIRA Tickets (and others) that Anton and Jason have found: Folio integration with an orchestration toolset

Action items

  •