20181024 Technical Council Meeting Notes

Date

Meeting recording

Attendees

Discussion items

TimeItemWhoNotes
15 min

Workflow PoC update

Presentation

Peter Murray
  • Peter to reference the Workflow PoC planned deliverables in a future update
  • The use of Spring instead of RMB/Vert.x creates a decision point once the PoC is done. Ideally we wouldn't introduce a deviant technology to a core module.
5 minDevOps Needs being presented to PC tomorrowJakub Skoczen
  • Resources are scarce at the same time demands on DevOps are increasing. Plea to PC to find more resources
 45 min Reporting All

Summary of prior TC email thread and discussion in the 10/24/2018 meeting:

    • The TC has not published a technical blueprint for FOLIO Reporting needs
    • Prior to the TC's existence the was a de facto plan for reporting that was discussed in this presentation was delivered to Reporting SIG and community in Madrid in January of 2018. This design featured a "pipe" (a message queue) out of FOLIO which events would flow into a Data Lake and Edge APIs that would resolve references to data referred to in the events. The intent was to pour these data into a Data Lake which would facilitate a variety of reporting technologies and methods, preserving flexibility and choice for FOLIO adopters. Note that a data warehouse was intentionally left out of that solution (see issues with Data Warehouse mentioned later). Note that this approach was also discussed in Durham at WOLFCon. This approach is being planned per the Asynchronous Event Service - to be implemented in Q4.
    • The Library Data Platform document that Nassib Nassar created offers a solution that is somewhat different than prior discussions had assumed. Principally the solution provides for a data warehouse, populated primarily by batch exports out of FOLIO modules. There are two aspects that raise question/aren't necessarily consistent with prior plans: the use of Batch Export/ETL (vs streaming events and calls to Edge APIs), and the direct inclusion of a Data Warehouse (no Data Lake first). Questions arise as to whether this is a change in the de facto plan for reporting or rather a planned implementation outside of FOLIO. For discussion's sake we'll refer to the solution that includes a data warehouse as LDP.
    • Perhaps some of the reasons the Reporting SIG wanted a 'Reference implementation of a Data Warehouse Solution' included: 
      1. While the members may have understood the concepts behind the de facto approach, their experience didn't allow them to see how to actually achieve the reporting solutions that any implementer of FOLIO would need; they wanted an end to end implementation that they could evaluate.
      2. The desire to have a shared and familiar set of tools that allow institutions to access the data that is inside FOLIO and run their library.
    • It seems that there isn't close enough alliance and common understanding between the TC and the Reporting SIG


    • There are technical issues relating to the creation of a data warehouse:
      • The primary being the tight coupling of the data warehouse's schema and ETL to the schema of the origin data sets. This maintenance needs to be accounted for. Some feel that this maintenance activity should NOT be part of the FOLIO project.
      • FOLIO is Open Source and there are not many robust Analytics tools that can be included in its Apache 2 license
      • Folio is based on micro-services and a monolithic database is not available
      • Folio operates on data which may not be stored in Folio but are externally linked instead (e.g. KB,or Student Information Systems)
    • The creation, operation and maintenance of a data warehouse should not be underestimated, nor can we ignore the fact that each implementation will likely be somewhat unique, bringing the question the value/merit of a centralized 'FOLIO' reference data warehouse.
    • There are many technical choices for Data Lake and Data Warehouse implementations, with more emerging every year. Institutions that adopt FOLIO likely already have their preferred analytics tool platforms in place. It might be tough for FOLIO to choose one that would fit enough of the potential community to make sense. In other words, whatever choice might be made inside FOLIO could be 'wrong' to a cross-section of FOLIO adopters.
    • A central philosophy in FOLIO is to stay away from monolithic solutions and enable precise single-purpose modules that are excellent at the singular thing that they do. The divide-and-conquer approach of FOLIO providing a mechanism to output data (message queue/edge apis) and there being many ways to consume that data for specific reporting needs seems to fit this pattern.
    • FOLIO Service Providers (like EBSCO, Index Data, Bywater, others) may include some set of analytics tools and capabilities as part of their services, and in fact this may be a way of differentiating one service provider from another, as well as providing choice for the FOLIO community.

    • To a large extent, the question of what is "part" of FOLIO versus what is not. No one disputes that a data warehouse will be a valid design and even necessary for some institutions. The question is really whether that is part of FOLIO or not.
    • Another question raised was whether the LDP is being created as a "part" of FOLIO? Perhaps it isn't ... but if not part of FOLIO, what is it 'part of'?
    • The fact that Batch ETL as opposed to streaming ETL is so much a part of LDP is also something that wasn't anticipated. So far FOLIO's core modules have been able to be modified to support the Batch function. Further technical discussion may be warranted.


    • The LDP end to end prototype is scheduled to finish in early November. We are eager to see what was learned and what is recommended for next steps.


  • See action items

Action items

  • Mike Gorrell to draft an initial Tech Blueprint for FOLIO Reporting to be finalized by the TC in prep for conversation with Reporting Sig
  • Tod Olson to invite key Reporting SIG members to a future TC meeting to receive presentation of Tech Blueprint for FOLIO Reporting and for discussion to ensure we're on the same page