Architecture Blueprint Sessions - WolfCon 2020

Two sessions were used during WolfCon 2020 to discuss topics that relate to the Folio Architectural Blueprint. The Architecture Blueprint being defined as those longer term platform capabilities which are either strategic or foundational for future feature development. The outlook is forward looking, as compared to technical debt which looks to the past.

The format of these sessions was to allocate 10 minutes for each topic. After a brief introduction each topic was discussed with in the room. The goals of these sessions were:

inform on items being considered for a formal architectural blueprint
gather feedback on those items
obtain a sense of the perceived urgency around each
NOT to delve into the solution space for each item

Blueprint Items Discussed

Day 1:

Security (fallout of the Security Audit)
Refactoring Okapi
Tenant Management
Multi-Tenancy and Cross-Tenancy
Adopting PubSub
Adding Support for GDPR

Day 2:

Search Engine
Users and Permissions
Automation Engine
GraphQL
Database Connectivity

Not Discussed:

Codex
Inter-Folio Integration

Materials

Recordings of Sessions

Day 1:https://drive.google.com/open?id=15mAQu3u0CEbR1AX2NrjfL0A6mHHFuvM9

Day 2: https://drive.google.com/open?id=1ZR2PRTKHro9HRaNxzQHbDRGZBNu-OBhVArch

Session Notes

Thanks to Zak

Day 1:

many topics
forward looking vs backward looking (tech debt)

security audit fallout
    audit in February; expect results ~march 1
    weigh the urgency of the fallout vs other changes?
    there may be changes that roll together with other non-security changes
    urgency is high: expect to act on it basically as soon at it comes out
    we have a security policy but not yet a team; whoops; we should do that ASAP
        doc exists but not yet public
        will be publicized when ready
            security@folio.org, non-public jira project
        job of sec team to triage those jobs
            determine whom to notify, and when, and with how much detail
    emergency release process?
        not in place at present.
        hosting provider may provide a work-around to limit the impact
        at the same time, issue will be raised with the community to dev a fix
        once impl'ed, verified, then SP can deploy to hosting instance
        permanent fix may take a while...
        how does this interact with q'ly release schedule?
    general agreement: yes, respond to the security audit

refactoring okapi
    okapi has grown in role since original conception, impl
    many features
        proxy gateway
        tenant
        dep mgmt (build time)
        discovery system (installed modulues)
        tenant APIs (provisioning, upgrades)
        etc etc: timer, pre/post filters, etc
    goal: separate into multiple components
        high risk of making change to a monolith
        is security vulnerability
            e.g. setup requires elevated perms
            at the same time, stripes talks to okapi
            this is not ideal; these two are incompatible
        goal: restricted perms on runtime role
            elevated perms on setup role
        separate tools are more free to evolve because are independent
    splitting also allows these components to scale independently
    dep mgmt: does this compound that problem? still more apps, ugh
        (devil's advocate)
        these aren't apps; are tools.
        yes, proliferation is a problem, but separation of concerns is worthwhile
        can separate services without necessarily separating codebases
            this would be nice; could resolve some issues of compatibility
    don't have to separate each services into separate modules
        but some separation feels like a good thing
    general agreement to discuss this further, esp WRT security
        but uncertainty about the details of this refactoring
    other lurking issues: performance, efficiency
    what of shanghai's reqs: okapi must change to accomodate that, or wholesale changes to comm model
        separating discovery from gateway, don't have to change gateway
    maybe refactor is a misnomer here; we are discussing outward facing changes
        ... not simply internal refactoring
    a discussion about okapi is a platform discussion
        this is re-architecture, not refactor
        (this is scarier; has bigger impact on the rest of the platform)
        can change process model without having to change everything
        refactor now to make re-arch later less difficult?
    agree this is blueprint item
        timing may depend on sec audit
        discuss this in parallel/immediately after the sec audit?
    agree to discuss soon-ish

tenant management
    currently owned by okapi, at least in part
    1. admin component for provisioning, upgrading tenants
    2. runtime component for tenant registration
    WRT updates/migrations:
        part of modules?
        part of tenant API?
        devops sees different perspective
    may not matter where the tenant API impl lives, i.e. in okapi or elsewhere
    migrations need the elevated perms; can run on BE as admin tool if separated
    security aspect is the silver lining of this: isolating this from Okapi is good
    this is def. part of okapi refactoring
    v. impt for multi-tenancy arch.
    present issue is highly isolated tenant fx'ality; to coordinate across tenants, must build it separately
    does this separation make simple cases simple if you are not impl'ing multi-tenant?
        WRT tenant mgmt only, for single tenant is minor help
        for multi tenant, is HUGE help
    this also affects multiple tenants using one okapi
    this is dependent on multi-tenant libs planning to go-live in summer 2020
    agree this is a blueprint item
    important, but less so than security
    what is the price of this kind of change after a multi-tenant place goes live?
        depends on the change....
        maybe is huge: can't go live until have multi-tenancy, and need this first....
    maybe must include proxy as part of this?
    this also addresses the consortia problem
    investigate how reshare handles this?
        they run fully isolated tenants, exchange data across tenants
    need clear understanding of motivation for these items
    what is full multi-tenancy?
        multiple tenants sharing data; this does not exist at present
    agree to discuss ... "slater"
    does this even belong in okapi if it stays within Okapi?
        don't necessarily have to break into own service

full multi-tenancy
    how do we provide full multi-tenancy
    MT means sharing data across tenants
    want to separate initiating tenant from target tenant of an action
    this is maybe more comfortable as a roadmap item than tenant-mgmt
    must talk about this: may or may not imply LOTS of work
    there's a lot of complexity here
    permissions are currently tenant-scoped; need a new perm system
    talk about this sooner in order to prioritize it sooner
    security is an impt consideration here: isolation provides security; people want that
    the soln here must not undermine isolation elsewhere
    agree this is blueprint item
    discuss sooner

adopt pubsub
    have dev'ed P/S as part of source-record-storage
    so can handle comm between modules with event-based mechanism
        developed with this in mind. yay!
    decouples modules from one another
    overlaps with techdebt somewhat
        must reexamine some impls;
        can also
    discovery: realtime upgrades is holy grail
        so from vufind's view: this is HUGE
    very strong feelings about this being essential to future extensibility
        folio-core is hard to break apart right now
        having this is more of a hook module
        can have more uni-directional deps
        strong agreement all around
    do we impl/reimpl some integrations?
    most urgent is to make sure it is present somewhere?
        not part of edelweiss, is on master now?
    folijet to demo
    this is the beginning of Saga support
    could provide for distributed transactions in some circ functions
    strongly agree this is a blueprint item
    discuss soon, assuming is actually available

support for GDPR
    support is no brainer; must do
    mostly concern for hosting provider
    right of access, rectification, erasure, restrict processing, data portability
    can do some of this now but is highly manual process
    was always intended to make this easier by providing some APIs
    arch req not just a feature req because imposes some req's on modules
    does GDPR give you a timeline? believe yes but don't know
    this is crucial for this to be a non-US centric project
        not urgent, later is OK but is required
    Chalmers is live; this puts pressure on them.
        manual process is OK now, but won't be at scale

Day 2:

goal
    id issues NOT solve them
    then prioritize, set a timeline

search engine
    search built on top of postgres at present
    but will hit a tipping point where that starts to fail
    next level: dedicated search engine, e.g. elasticsearch
    is it functional, performant, easy to implement
    what about results display?
    universal search is a separate/related question
    is peristence engine not best for postgres? e.g. cassandra?
    this proposal suggests search as a bolt-on
        changing the persistence mechanism would be more integrated
    full-text engine vs rdbms have different strengths; may need both
    OLE used solr: problem then was then that solr indices became treated as source of truth
    postgresql is a swiss army knife of persistence
        rdbms, message queue, json storage...
    discussions WRT postgres services may make HA instances hard
        but adding other services, then have to solve those same problems for add'l tools
    any solution will take massaging to get good perf
    fulltext search is powerful ... but doesn't support localization
        will have to wrestle with this eventually
    NLAustralia: we have gone down the search engine road; happy with it
        do we have exp with large DBs?
        we know the lucene-based search indexes that perform fine
        if we lack the experience, we shouldn't research it, we should find ppl with exp
            this knowledge exists; don't spend time rediscovering it
    discovery layer will surely exist separately, and surely use a search engine
        but we are building an ILS, not a discover layer
    agreement this is impt
    agreement on sooner rather than later

users and permissions
    introduce tenant-level and system-level users
    every user in the system now gets perms assigned
    introduce role-based perms
    at system level, have batch operations
        don't want to create dedicated users for this, is hacky
    1. system-level users
    2. roles
        abstraction layer between users and roles
        will be impt when workflows; e.g. notifications assigned to roles
    can assign multiple psets to a single user
        psets aggregate permissions
        roles would aggregate users
    no way to cluster users at present
    UChicago: this would be helpful
    roles provide hierarchical permission assignment
    teams: ERM has some notion of this
        folio doesn't have any data-level protections
        perms are focused around endpoints only
        could roles help get us there?
        e.g. a group of people who should only be able to see orders with a certain attr
            e.g. only math-dept orders, only history-dept orders...
    agreement: should we split tenant level/system level users
    agreement: should explore roles
    this will only get harder if we wait
    agreement: system user notion is more impt than roles

automation engine / "workflow engine"
    engine required to provide automation on the backend; NOT user-visible
    should/when will we adopt such an engine
    could eventually help with user-visible things, e.g. tasklists
    nuance: site specific tweaks to processes are necessary
        right: we are talking about building the player piano; of course scroll is changeable
    are there concerns that necessary hooks in the system may be absent?
    camunda POC attached to GitHub:folio-org, but lots of progress in GitHub:tamu
        TAMU: using workflow for data migration
    this would be unique in this marketplace! whoop whoop!
    any relation to pubsub? ATM, handled separately
    agreement: this belongs on roadmap
    agreement: sooner; there is Q3/Q4 work scheduled that is related
    ought to be able to do anything manually that is being automated

graphQL
    prototype exists
    allows you to tailor a single request to get data in desired shape
    behind that is system to traverse the graph to assemble this
    initial look: optimization for UI; fewer requests
    subsequent look: want to streamline dataflow between modules, not just in UI
        right now can't get only IDs; get full records, which is less efficient
    could streamline APIs?
    at simple impl level, could simply add biz layer atop APIs
        still making API calls under the hood though...
        don't know about opportunity for actual optimization
    graphql could maybe provide some efficiencies

database connectivity
    we have module level connections currently
        if have multi-tenant system, any tenant uses single cx
    would be great to have more granular connectivity
        e.g. assign per-tenant cx
        e.g. interface-level cx, e.g. some data in local stores, some in cloud...
        e.g. GET/PUT/POST separation, separate cx to read-optimized/write-optimized stores
    for folks wanting to share infrastructure with separate data, this does not work well
    last bullet is solved outside of folio; we should not solve it ourselves
    secret mgmt is related, maybe?
    agreement this is blueprint item
    agreement this is urgent

codex!
inter-folio integration
    CALIS has explored this somewhat
    how do multiple folios interact? transparently to user.
    agreement this is later