2021-07-26 Reporting SIG Meeting notes

Date

26 Jul 2021

Attendees

Present?	Name	Organization	Present?	Name	Organization
X	Sharon Beltaine	Cornell University		Tod Olson	University of Chicago
	Nancy Bolduc	Cornell University	X	Jean Pajerek	Cornell University
X	Axel Doerrer	University Mainz	X	Michael Patrick	The University of Alabama
	Stefan Dombek	Leipzig University	X	Eric Pennington	Texas A&M
X	Jennifer Eustis	U. Massachusetts Amherst / Five College	X	Scott Perry	University of Chicago
	Ingolf Kuss	hbz		Natalya Pikulik	Cornell University
	Jesse Lambertson	University of Chicago		Vandana Shah	Cornell University
X	Eliana Lima	Fenway Library Organization	X	Amelia Sutton	U. Massachusetts
X	Linda Miller	Cornell University	X	Simona Tabacaru	Texas A&M
	Nassib Nassar	Index Data	X	Kevin Walker	The University of Alabama
	Elena O'Malley	Emerson	X	Angela Zoss	Duke University
X	Shelley Doljack	Stanford University	X	Lloyd Chittenden	Marmot
Guest	Ian Walls	ByWater Sollutions

Discussion Items

Item	Who	Notes
Attendance & Notes	Angela	Attendance & Notes Today's attendance-taker: Linda Miller Today's note-takers: Team Leads for project updates
Presentation: ByWater reporting solution	Ian Walls	https://github.com/bywatersolutions/folio-reporting Have four partners, mostly small, largest is about 200k bibs. Also limited staffing, 2 or 3 librarians total. EBSCO takes care of hosting LDP is not a component of these environments Some blockers: extra infrastructure of separate database, and the data sync process (a lot of overhead for this scale) Still have reporting needs EBSCO is coming out with Panorama, which is good for analytics, but not necessarily for reporting (I have things I need to act on) Working through the APIs was challenging; FOLIO is designed to be API-first, which means that it's not data-first The APIs don't really cover everything, don't take into account business logic, don't always support bulk operations Looking at the data structures in Postgres, you get a bit of a faster response, but the data are stored in separate schema (harder to query, no foreign key constraints) and largely data are in JSON objects (harder to query directly) Decided to use views to normalize the data inside the FOLIO Postgres database John at UC had already created these views, Ian has generalized it a bit to work for any tenant; script lets you input your tenant information and generates SQL statements for your DB admin Don't have to have a separate database, it is real-time, but then you could accidentally take down production with a large query This is read-only So far haven't encountered runaway query; could use "materialize" to cache the view, basically make a copy Views put all modules into the same schema Also had to solve the problem of giving people access to that data; using Metabase, which has cut down the amount of time answering support questing by factor of 10 Goal is to make it self service for the partner librarians; dashboard of key metrics, canned reports Could still come to ByWater for custom reports Still need to do: review schema changes for Iris test at a larger scale, see where performance concerns start to emerge come up with actual reports (proactive, rather than reactive) Does Metabase work with EBSCO's security requirements, like fixed IP? Metabase installed in Kubernetes cluster, and it uses SSH tunneling to the static IP node Just focused on data within FOLIO instance right now; also targeting external data, like EBSCO knowledge base? Not at this time; EBSCO Panorama might cover some of that. Metabase could connect to other data sources, but haven't had a use case for that yet. When you pull together a group of records, can you get MARC records and export to a file? Yes, MARC is in Source Record Storage, in binary and JSON Could you also re-import them into FOLIO and create a group based on that? For bulk record edits in the past, I have identified the records, exported the UUIDs, use Data Export to pull the records, process them outside Get the list, use the singleton APIs to process the records one at a time to avoid conflicts (can be slow, try not to do them too often) Are you normalizing the MARC data? Not usually, haven't needed to do really in-depth MARC processing, usually just doing simple operations with a python script Question about joins; in past ILS reporting environment, colleagues have had trouble joining data because the field names have been changed for reporting, not sure how to join between data Yes, Metabase allows you to add relationships between tables For additional documentation, you could write your SQL Create View script to include descriptions Would also like to modularize this a bit; since FOLIO is modular and you can install just a subset of apps, don't want to create views for those apps Discussion: Seems like this system still has to manually deal with changes to data schema in apps Any expectation that data stability will improve over time? Starting to see some coordination around this, at least in the area of MARC Axel will be representing European institutions on the task force to talk about data integration
Announcements / Reminders	Angela	Next week (August 2): New Time Starting August 2, our SIG weekly meeting time will shift forward one hour, from 9:00 a.m. Eastern to 10:00 a.m. Eastern (use this time zone converter to identify the correct time for your location) Our Zoom details will change; please use the details sent via email (also here) https://openlibraryfoundation.zoom.us/j/601231377?pwd=ZVFtQWxUaTFLb1J3b1JPdlZqZU1lQT09 Meeting ID: 601 231 377 I'm happy to manually add people to my personal calendar event if anyone wants to be added Recruiting New Query Developers The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit. Test Data FOLIO Reporting developers and Reporting SIG members are encouraged to use this new page to share test data cases they have entered into FOLIO reference environments to keep us all aware of what data to expect when we test our queries. Test Data Updates to FOLIO Snapshot Environments Cluster Ranking New Report Clusters are added on a regular basis, so it is important to make sure your institution is reviewing these clusters and ranking them to establish report development priorities. If you rank reports for your institution, please follow the instructions below. If someone else ranks, please pass this information along to that person so your institution's vote can be included. Action =>> Please review Reporting SIG-All Report Clusters (57 issues) in JIRA and RANK each report cluster for your institution (R1-R5) For reporting, institutions only need to rank the UXPROD Report Cluster JIRA issues. All reporting requirements, which are captured in REP-XXX issues, roll up to the UXPROD Report Clusters. Report clusters cover one or more report (REP-XXX issue) requirements.
Update LDP implementers grid	All	FOLIO LDP1-based Reporting First Implementers Grid
Updates and Query Demonstrations from Various Reporting Related Groups and Efforts	Community & Coordination, Reporting Subgroup Leads	Project updates Reporting development is using small subgroups to address priorities and complete work on report queries. Each week, these groups will share reports/queries with the Reporting SIG. Reporting development team leads are encouraged to enter a summary of their work group activities below. RA/UM Working Group Continuing review of queries that need updates, plans for tighter connections to RA SIG Will hold off on additional queries to finish up query updates and documentation Context Meeting notes: https://docs.google.com/document/d/1UnzG64tl917LOH2FtWhCEPlSOsnJWKL-0eu88Ouo1DU/edit Current status of RA/UM issues: https://github.com/folio-org/folio-analytics/issues?q=is%3Aissue+is%3Aopen+RA%2FUM MM Working Group The group is still on pause. Our meetings will reconvene Aug. 3, from 12-1pm ET via zoom ERM Working Group ERM - order status report query is out now ERM development group working on a presentation on their development work in ERM relying on Metadb prototype ERM Prototype and Query Development Status RM Working Group no meeting last week several queries for RM completed, but still need documentation, testing, and review working on queries for the 1.2 FOLIO Analytics release for latest updates, see RM Prototype and Query Development Status Friday the RM SIG proposed a new organization; ERM would be raised to level of SIG, and RM would join Acquisitions group to form an Acquisitions SIG, also elevate Open Access group to a SIG Reporting SIG Documentation Subgroup No changes Additional Context Over the next few months, Reporting SIG Documentation Subgroup will be helping to build end-user documentation for https://docs.folio.org/docs/ (much will be linking to existing documentation over on GitHub) see Reporting SIG Documentation Subgroup and Guide to Reporting Documentation pages in this wiki External Statistics Working Group no updates currently ACRL query set included in FOLIO-Analytics 1.1 release new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository New organizational structure for External Statistics reports external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues each reporting development team will take responsibility for the queries in their area for the external statistics clusters For all recent work on FOLIO Reporting SQL development: https://github.com/folio-org/folio-analytics/commits/main
Topics for Future Meetings	All	Alternate FOLIO reporting systems ~~Google Sheets add-on (June 14)~~ ~~ByWater system (July 26)~~ Follow-up on MARC status, Quickmarc/Data Import conflicts How to strengthen connections to SIGs and their developers to be kept in the loop about changes to the data model Show and tell how are institutions using the LDP examples of using the local schema Cornell's report ticketing system Rollout plans from institutions Ask someone on the sysadmin side to talk about LDP administration (Jason Root?) Training topics adding test data in FOLIO snapshot How to do ad hoc querying with the derived tables How to use the LDP app using KNIME to build reports use of local schema for custom tables more on MARC using different applications (other than DBeaver) Postman for API queries, also Insomnia Upcoming: 16 August: presentation from ERM group on metadb/ERM 2 August: Angela presents on KNIME, Postman 9 August: Sharon presents on Virtual DBeaver Review and update Topics for Future Reporting SIG Meetings

A test Action Item (Ingolf)