2021-07-26 Reporting SIG Meeting notes

Date

Attendees

Present?

Name

Organization

Present?

Name

Organization

XSharon BeltaineCornell University
Tod OlsonUniversity of Chicago

Nancy BolducCornell UniversityXJean PajerekCornell University
XAxel DoerrerUniversity MainzXMichael PatrickThe University of Alabama

Stefan DombekLeipzig UniversityXEric PenningtonTexas A&M
XJennifer EustisU. Massachusetts Amherst / Five CollegeXScott PerryUniversity of Chicago


Ingolf Kusshbz
Natalya PikulikCornell University

Jesse LambertsonUniversity of Chicago
Vandana ShahCornell University
XEliana LimaFenway Library OrganizationXAmelia SuttonU. Massachusetts
XLinda MillerCornell UniversityXSimona TabacaruTexas A&M

Nassib NassarIndex DataXKevin WalkerThe University of Alabama

Elena O'MalleyEmersonXAngela ZossDuke University
XShelley DoljackStanford UniversityXLloyd ChittendenMarmot
GuestIan WallsByWater Sollutions


Discussion Items

Item

Who

Notes

Attendance & NotesAngela

Attendance & Notes

  • Today's attendance-taker: Linda Miller
  • Today's note-takers:  Team Leads for project updates
Presentation: ByWater reporting solutionIan Walls

https://github.com/bywatersolutions/folio-reporting

  • Have four partners, mostly small, largest is about 200k bibs. Also limited staffing, 2 or 3 librarians total.
  • EBSCO takes care of hosting
  • LDP is not a component of these environments
  • Some blockers: extra infrastructure of separate database, and the data sync process (a lot of overhead for this scale)
  • Still have reporting needs
  • EBSCO is coming out with Panorama, which is good for analytics, but not necessarily for reporting (I have things I need to act on)
  • Working through the APIs was challenging; FOLIO is designed to be API-first, which means that it's not data-first
  • The APIs don't really cover everything, don't take into account business logic, don't always support bulk operations
  • Looking at the data structures in Postgres, you get a bit of a faster response, but the data are stored in separate schema (harder to query, no foreign key constraints) and largely data are in JSON objects (harder to query directly)
  • Decided to use views to normalize the data inside the FOLIO Postgres database
  • John at UC had already created these views, Ian has generalized it a bit to work for any tenant; script lets you input your tenant information and generates SQL statements for your DB admin
  • Don't have to have a separate database, it is real-time, but then you could accidentally take down production with a large query
  • This is read-only
  • So far haven't encountered runaway query; could use "materialize" to cache the view, basically make a copy
  • Views put all modules into the same schema
  • Also had to solve the problem of giving people access to that data; using Metabase, which has cut down the amount of time answering support questing by factor of 10
  • Goal is to make it self service for the partner librarians; dashboard of key metrics, canned reports
  • Could still come to ByWater for custom reports
  • Still need to do:
    • review schema changes for Iris
    • test at a larger scale, see where performance concerns start to emerge
    • come up with actual reports (proactive, rather than reactive)
  • Does Metabase work with EBSCO's security requirements, like fixed IP?
    • Metabase installed in Kubernetes cluster, and it uses SSH tunneling to the static IP node
  • Just focused on data within FOLIO instance right now; also targeting external data, like EBSCO knowledge base?
    • Not at this time; EBSCO Panorama might cover some of that. Metabase could connect to other data sources, but haven't had a use case for that yet.
  • When you pull together a group of records, can you get MARC records and export to a file?
    • Yes, MARC is in Source Record Storage, in binary and JSON
    • Could you also re-import them into FOLIO and create a group based on that?
    • For bulk record edits in the past, I have identified the records, exported the UUIDs, use Data Export to pull the records, process them outside
    • Get the list, use the singleton APIs to process the records one at a time to avoid conflicts (can be slow, try not to do them too often)
  • Are you normalizing the MARC data?
    • Not usually, haven't needed to do really in-depth MARC processing, usually just doing simple operations with a python script
  • Question about joins; in past ILS reporting environment, colleagues have had trouble joining data because the field names have been changed for reporting, not sure how to join between data
    • Yes, Metabase allows you to add relationships between tables
    • For additional documentation, you could write your SQL Create View script to include descriptions
    • Would also like to modularize this a bit; since FOLIO is modular and you can install just a subset of apps, don't want to create views for those apps
  • Discussion:
    • Seems like this system still has to manually deal with changes to data schema in apps
    • Any expectation that data stability will improve over time?
    • Starting to see some coordination around this, at least in the area of MARC
    • Axel will be representing European institutions on the task force to talk about data integration


Announcements /
Reminders

Angela

Next week (August 2): New Time


Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.


Test Data

FOLIO Reporting developers and Reporting SIG members are encouraged to use this new page to share test data cases they have entered into FOLIO reference environments to keep us all aware of what data to expect when we test our queries.


Cluster Ranking

New Report Clusters are added on a regular basis, so it is important to make sure your institution is reviewing these clusters and ranking them to establish report development priorities. If you rank reports for your institution, please follow the instructions below. If someone else ranks, please pass this information along to that person so your institution's vote can be included.

  • Action =>> Please review Reporting SIG-All Report Clusters (57 issues) in JIRA and RANK each report cluster for your institution (R1-R5)
  • For reporting, institutions only need to rank the UXPROD Report Cluster JIRA issues. All reporting requirements, which are captured in REP-XXX issues, roll up to the UXPROD Report Clusters. Report clusters cover one or more report (REP-XXX issue) requirements.
Update LDP implementers gridAllFOLIO LDP1-based Reporting First Implementers Grid
Updates and Query Demonstrations from Various Reporting Related Groups and EffortsCommunity & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.

RA/UM Working Group


MM Working Group

  • The group is still on pause. Our meetings will reconvene Aug. 3, from 12-1pm ET via zoom


ERM Working Group


RM Working Group

  • no meeting last week
  • several queries for RM completed, but still need documentation, testing, and review
  • working on queries for the 1.2 FOLIO Analytics release
  • for latest updates, see RM Prototype and Query Development Status
  • Friday the RM SIG proposed a new organization; ERM would be raised to level of SIG, and RM would join Acquisitions group to form an Acquisitions SIG, also elevate Open Access group to a SIG


Reporting SIG Documentation Subgroup

  • No changes
  • Additional Context


External Statistics Working Group

  • no updates currently
  • ACRL query set included in FOLIO-Analytics 1.1 release
  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
  • New organizational structure for External Statistics reports
    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters


For all recent work on FOLIO Reporting SQL development:


Topics for Future MeetingsAll
  • Alternate FOLIO reporting systems
    • Google Sheets add-on (June 14)
    • ByWater system (July 26)
  • Follow-up on MARC status, Quickmarc/Data Import conflicts
  • How to strengthen connections to SIGs and their developers to be kept in the loop about changes to the data model
  • Show and tell
    • how are institutions using the LDP
    • examples of using the local schema
    • Cornell's report ticketing system
    • Rollout plans from institutions
    • Ask someone on the sysadmin side to talk about LDP administration (Jason Root?)
  • Training topics
    • adding test data in FOLIO snapshot
    • How to do ad hoc querying with the derived tables
    • How to use the LDP app
    • using KNIME to build reports
    • use of local schema for custom tables
    • more on MARC
    • using different applications (other than DBeaver)
    • Postman for API queries, also Insomnia
  • Upcoming:
    • 16 August: presentation from ERM group on metadb/ERM
    • 2 August: Angela presents on KNIME, Postman
    • 9 August: Sharon presents on Virtual DBeaver


Review and update Topics for Future Reporting SIG Meetings 





  • A test Action Item (Ingolf)