2020-02-21 - System Operations and Management SIG Notes

Date

Attendees

Goals

Discussion items

TimeItemWhoNotes
5WelcomeIngolf
  • Someone be a note taker; Ingolf did some note taking.

Architectural diagram
  • Wayne & Jason are working on the architectural diagram for Kubernetes-Rancher installation.
  • a more general diagram, for those who do not plan to use Kubernetes/Rancher

Explain concepts:

  • Containerization, Deployment, Gateway Okapi
  • Okapi clustering / Hazelcast
  • Database clustering, Crunchy postgres
  • Orchestration

Notes

Harry shows a block diagram which he had once presented in a Power Point. It shows the frontend layer on top of the gateway, the backend modules, blue and orange ones. The storage layer was omitted, because presenting just one single database might suggest not making use of the modular microservice concept. Ingolf thinks that the storage layer should be added to the diagram again, as it does not imply that seperating the database instances for the different modules is not possible. In fact, we can already now, à la main, enumerate 5 diffeners databases for Folio:

  • one for Okapi
  • one for the Modules
  • one for MARCCat
  • one for Kafka
  • one for the Reporting LDP

The Modules might have separate database, it's just not done like this at the moment. Ingolf will edit the diagram and add the storage layer to it.

Diagram at the following link.

https://drive.google.com/open?id=1Jq9piaUMWzx7sXLQBOuPtJvQcHvvGG4F


Link from Jo: Upgrading & Dependency resolving


Ingolf thinks that also other concepts of modern container deployment have to be explained to the community. Ingolf will collect the ideas and concepts that we have been discussed in this SIG in the pages under Notes for Conceptual Architectural Diagram . Among those will be

  • load balancing vs. clustering

K8s load balances; Okapi clustering is about that the Okapi instances talk to each other. The okapi instances are spun up as containers, either by hand or by an orcherstration tool like K8Ss, but in addition to this, the Okapi containers need to know each other. Okapi uses Hazelcast for this. Hazelcast is a plugin for okapi.

Modules are being replicated and load balanced by the orchestration tool, but they don't know the concept of clustering. That means that the load balancer chooses one particular module container to do some job. The other module containers don't know about this job, then. This is unlike for Okapi, the Okapi containers communicate over Hazelcast.

TAMU scales out Okapi to 5 to 7 instances. They might even expand to 14 instances! Jason reports, that scaling back down doesn't work quite well with Okapi. Hazelcast only keeps track of the state of the Okapi cluster. Jason has seen data loss during the process of scaling down. It might be that the frontend sends the request multiple times, when there is no response from a (spun down) container, until the request arrives at another container. But really, Okapi should never shut down without completing all requests.

The Vertex cluster doesn't handle that. If you have an upgrade in Okapi (in the Okapi version), you have to shut down Okapi and spin it back up. That's what the OpenSource  version of Hazelcast supports. The community relies on the Open Source version. Also TAMU uses the Open Source version. But with the Enterprise version of Hazelcast you can do in-place upgrades of Okapi, without shutting it down.


  • data base clustering

Ingolf also thinks that the concept of database clustering should be briefly explained in the notes to the diagram. First of all, we can't run the database in a container, because that would duplicate its data, those becoming divergent , then.  So, instead of containerization and orchestration, the concept to run a database under load is called database clustering. The data sets are being replicated, but not by a tool external to the db (like K8s would be), but by a tool built in to the database. TAMU uses Crunchy for this, Lehigh utilizes pgpool. These tool replicate the datasets and distribute the load on the different database instances, but they keep the database instances synchronous, of identical contents. Some instances are used only for reading, others also perform write accesses.

When one deploy with a cloud service, like AWS, there are services for this. They are called RDS = relational database services, at least in Amazon cloud services.

Jason: In request management, dangling requests die, they are not "orphane". (IK: which is probably both not desired ??)



If splicing something out of Okapi, be aware this will add network delay and processing time for that. So in the end, if you're splicing out a data heavy part of processing, or a timeout or security sensitive one, you may actually make it worse, not better (thanks to Jo).



Calmers UpdateHarry

Chalmers is experiencing some problems with HRIDs in connection to the union catalog.

Overall Edelweiss performance for Checkin and Checkout is 30% lower compared to Daisy.


Software Support SIGHarry

A Software Support SIG has spun off. This will take one charge from the SysOps SIG. No one of the here present disagrees that we can remove that from the charge of the SysOps SIG.

Anton is on both SIGs, he is QA representant of the Support SIG.


Next week:

Pub Sub

Moving towards the use of PubSub wil benefit us. This uses an event bus.

Let's inivte members from the PubSub team to the SysOps group!

Harry will invite Vince Bareau.

Next week : Vince Bareau on PubSub integration!

PubSub is being spliced out of Okapi. Okapi is doing a lot, now. This is one of the things that might not be under Okapi's umbrella. But it is not a real performance issue at all.

Action items

  •