Page tree
Skip to end of metadata
Go to start of metadata
Submitted Date

 

Approved Date

 

StatusACCEPTED
ImpactMEDIUM

 

Overrides/Supersedes 

NA

RFC 

NA

Stakeholders

  • Performance Task Force (PTF)
  • #spring-force
  • #development slack channel

Contributors

Approvers

Background/Context

  • Currently 40+ RMB storage modules communicate with the same database endpoint to retrieve and write.
  • For scalability, cloud providers such as AWS offer a solution to segregate the read and write operations to different database nodes by providing an easy way to attach the read nodes and sync-ing the data between the nodes.
  • FSE explored various proxy options to automatically split the read and write traffics such as pgBouncer, pgPool. None has worked out.
  • RMB-348 was created 4 years ago and in 2022 PTF took a stab at implementing it the way it was discussed in the story.  
  • With Core-Platform’s guidance, thanks to Julian Ladisch and Adam Dickmeiss’, RMB-348 was completed in Morning Glory and released in Nolana (RMB v35.0.0).
  • Data Import ( MODSOURCE-540 )
  • Three workflows underwent rigorous performance testing: 
    • Check In, Check Out
    • Data Import (Create and Update MARC BIBs)

Assumptions

  • Similar performance improvements could be seen in other workflows under high CPU load. 

Constraints

  • Database technology will handle sync-ing data between the read and write nodes

Rationale

  • For Scalability, performance, and cost-saving

Decision

  • FOLIO has adopted the following approach to splitting database read/write traffic:
    • Storage modules create a read connection pool by the presence of two new environment variables: DB_HOST_READER, DB_PORT_READER
    • The solution is not specific to any particular database technology.
    • Configuring the DB cluster to attach DB read nodes or to sync data between the nodes is not in the scope of this work
    • This solution can be implemented at the framework level, and has been in RMB v35.0.0

Implications

  • Workflows and module designs needed to consider the potential for stale data
  • Increased design complexity
  • Increased operational complexity
  • More database connections (and thus) being needed within each module instance

Please see Splitting Database Read/Write Traffics for more details. 

Other Related Resources

6 Comments

  1. Here at Tamu, we currently provide for our Folio Pre and Prod environments a Postgres HA stack for each. This operates on Patroni, an open-source Postgres HA project with 1 primary and 2 replica VMs. HA Proxy and PG Bouncer sit in-front of the Postgres VMs - providing connection pooling and traffic directed based on what is requested, and who is the current read/write Postgres node. All of that is spun and maintained/managed by Ansible. We have a paid support subscription through Crunchy Data - as a kind of CYA in case of failures or break/fix needs in Production.

    All of this is to say it's possible to run such a system as a self-hosted entity. However it's not easy, and it's not cheap if you require support on your services.

  2. I should follow-up by saying none of the Postgres replicas we have are used for Folio read-only ops - because as was pointed out code changes needed to happen on Folio’s side for this to work properly.

    It has made fail-over for updating/upgrading, as well as fail-over for Postgres server-side issues easier to manage. We also utilize the replicas for data dumps to other instances of Folio for testing/development and so on, as the primary Postgres VM does not get touched - so Folio performance is not impacted.

  3. It seems like the title's inclusion of "in RMB" is a bit more narrow than the decision section implies. 

    The statement, "development for similar functionality in other frameworks (e.g., folio-spring-base)" indicates that this DR has ramifications beyond RMB. 

    1. I agree. I would prefer this ADR was framed as an architectural decision affecting all modules that use the database and the infrastructure of the system, rather than from any particular implementation (either current or future)

  4. During the presentation I claimed inaccurately that the number of database connections in the database will double. This is not entirely true because each write or read node maintains its set of connections to the clients,  therefore no doubling up of the number of connections on any single database node. However, the client modules will now have more database connection objects to maintain. This should not incur significant resource usage (e.g., memory) on the module side.

    1. This should not incur significant resource usage (e.g., memory) on the module side.

      It may incur a significant increase in the client side network connection and port usage.