2021-08-20 Meeting notes

Date

Attendees

Goals

Discussion items

TimeItemWhoNotes
Review the Kanban boardTeam

min.io / s3 compatible file storage

How should FOLIO store files (like PDFs attached to orders, agreements, etc.)?

TC discusses whether FOLIO should accept min.io as an official part of FOLIO platform:

Background:

mod-invoice-storage stores files (PDFs) into a JSONB property using base64 encoding:
https://github.com/folio-org/acq-models/blob/master/mod-invoice-storage/schemas/document.json

mod-agreements and mod-licenses store files (PDFs) into pg_largeobject without any tenant or module separation. The ERM development team doesn't use permissions for tenant separation, and it rejected the request to convert it into a PostgreSQL solution like bytea that provides tenant and module separation using schema (ERM-1779). The ERM development team wants to move to an external solution (UXPROD-3172) like min.io (or some other s3 compatible file storage).

mod-data-export-worker already uses min.io and the the FOLIO Ansible scripts install min.io for this module.

PostgreSQL supports storing binary files: https://wiki.postgresql.org/wiki/BinaryFilesInDB

  • "When should files be stored in the database? The common suggestion here is when the files have to be ACID."

  • "When is it bad idea to store binary files in the database? Very large files (100MB+), where performance is critical to the application."

  • Do smaller binary files result in bad performce? No, because "bytea and text data types both use TOAST (details here)."
  • PostgreSQL has two ways to store binary files:
    • bytea: This is a regular column type that can be used in any table. It works as usual with the schema separation where each combination of module and tenant has a dedicated database schema with a dedicated role that allows to access only the own schema.
    • pg_largeobject: This is a system table, PostgreSQL has exactly one. Access can be restricted to a role, this allows for module and tenant separation.

    • bytea with TOAST "makes the large object facility partially obsolete." (https://www.postgresql.org/docs/current/lo-intro.html)
  • For a detailed discussion see above BinaryFilesInDB link.

There are no performance issues with storing binary files in PostgreSQL.

Using a non-PostgreSQL option for storing binary files has been requested because it allows to split the backups into binary files and regular record data.

min.io server for multi-tenancy is licensed under GNU Affero Public License Version 3 (AGPLv3), this was changed in April 2021, it had been Apache 2 before. Min.io server for bare-metal or single-tenant and the MinIO Java SDK client continue to be released under Apache v2.0.

Proposal for a Security Team decision:

  • Binary files must be stored with strict tenant and module separation.
    • The TC has already discussed tenant separation and has made this decision that it's required. (see TC 2021-08-18 Meeting notes)
      • The security team agrees with the decision
  • A FOLIO MinIO security guide for developers and sysOps must be published and reviewed by the security team before more modules start using it
    • e.g. Including guidance for how to do the tenant/module separation 
    • The tech leads group will discuss this as noted during the TC meeting (see TC 2021-08-18 Meeting notes)

Reason for this decision:

    • This is to support multi-tenant installations.
    • This is to support modules the sysOp doesn't fully trust (as explained in https://dev.folio.org/faqs/explain-database-schema/).
    • Adding a new storage facility can easily create security issues if not done properly. FOLIO hasn't fixed the advanced database privileges issue with PostgreSQL (FOLIO-1935) yet.

Action items