2019-08-12 RPWG Meeting notes

Date

Attendees

Discussion items

ItemWhoNotes
Attendance
Update Attendee List
Action Items
Review and update Action Items
LDP road mapNassib
  • Considerations in developing this first road map for LDP releases:
    • Key requirements:
      • Many reports have been ranked by institutions for "go live (MVP)", which means that they will have to be written and debugged within the next few months.
      • It is not recommended to do significant SQL reporting using FOLIO's internal operational database, as has often been done with many traditional ILS systems.  One reason is that there is no requirement that FOLIO modules use the same database, which means that cross-domain table joins on the operational database may break irreparably in the future.
      • Reporting analysts want an easy and familiar query model, and one that works with common reporting tools.
      • Reporting analysts will want queries to run efficiently.
      • Reporting SIG members have repeatedly communicated the need for access to all FOLIO data for reporting purposes.
    • Challenges and constraints:
      • The number of storage interfaces in FOLIO will soon exceed 100.  Although ETL for most of them is relatively simple, synchronizing such a large number of tables reliably on schema changes requires very active coordination with FOLIO, which so far has not proved to be possible.
      • LDP critical dependencies on FOLIO core development which have been requested and flagged as critical beginning in late 2018 will likely not be addressed until mid-2020 or beyond, based on discussions with the FOLIO capacity planning and project management groups.
      • FOLIO is requesting that "go live (MVP)" features be completed by January 2020, to be released in Summer 2020.
      • Future developer resources appear to continue to be roughly 1 FTE or less on average, consisting of several part-time developers.
  • LDP 1.0 proposed core features, for "go live (MVP)":
    • Support for ad hoc, cross-domain queries for all, or a very large proportion, of FOLIO data extracted from storage modules.  We would ask this working group and the Reporting SIG to help us determine, in the near future, the definition of "all FOLIO data" required for inclusion in the LDP.
    • Include MARC records extracted from FOLIO SRS and transformed for easier querying.
    • Historical data will be retained in the LDP but not transformed into a single schema.
    • LDP database recommended to be refreshed once per day from the FOLIO operational database.
    • Support for optional anonymization of personal data.  The Data Privacy WG will propose requirements for this feature, in particular which fields should be anonymized.
    • Implementation guidelines (documentation) for local tables.
    • Support for PostgreSQL and Redshift database systems.
    • Proposed data model design for LDP 1.0.
  • Schedule:  LDP Beta (feature complete) in January 2020, LDP 1.0 in Summer 2020.
  • LDP beyond 1.0:  Historical queries using a single schema, ETL, and full relational or star schema can be implemented for later releases but are highly dependent on the identified critical dependencies and availability of developer resources.
  • Current support for query development:
    • The test database is now using the proposed data model design for LDP 1.0.
    • Also in the test database are data needed for the Circ Item Detail query, in the following tables:
      • groups

      • holdings

      • instance_types

      • instances

      • institutions

      • items

      • loans

      • locations

      • material_types

      • service_points

      • temp_loans

      • users

    • Additional tables will soon be added to support writing report queries as prioritized by this working group.
    • A few members of our development team will set aside time to assist this working group with SQL if needed.
    • We still urgently need test data that we can extract from a running FOLIO installation.
Next Reporting Data Models to DeliverAll

Shall we build these reporting data models next?

  1. Circ Item Detail: https://folio-org.atlassian.net/wiki/display/RPT/Circulation+Item+Detail+Report+Prototype
  2. Services Usage: https://folio-org.atlassian.net/wiki/display/RPT/Services+Usage+Report+Prototype
  3. Shelf List Location: https://folio-org.atlassian.net/wiki/display/RPT/Shelf+List+Location+Report+Prototype

Questions:

  • What do RPWG members think of these (above) as our current highest priorities?
  • Would the prototype developers like to develop the SQL queries for each of these?
For our next meeting...

Future Topics:

  • Definition of "all FOLIO data" required for inclusion in the LDP.
  • Evaluate the data model of the LDP's MARC implementation.
  • Identify representative report queries on MARC records.

Next meeting date:


Using GitHub for developing SQL report queriesNassib

Demonstration of how to contribute a new SQL report query--or a modification of an existing query--using the shared community space in GitHub that has been set up for this purpose.

Refer to Using GitHub to develop report queries for folio-analytics for written instructions.


Action items

  •  Angela to work with small group to look at Resource Access clusters
  •  Sharon to work with small group to look at Resource Management clusters