2022-03-17 Reporting SIG Meeting notes

Date

Attendees

Present?

Name

Organization

Present?

Name

Organization


Arthur AguileraUniversity of Colorado, BoulderxLinda MillerCornell University
xSharon BeltaineCornell UniversityxNassib NassarIndex Data

Erin BlockUniversity of Colorado, Boulder
Elena O'MalleyEmerson

Nancy BolducCornell University
Tod OlsonUniversity of Chicago

Lloyd ChittendenMarmot
Jean PajerekCornell University

Axel DoerrerUniversity Mainz
Michael PatrickThe University of Alabama

Shelley DoljackStanford University
Eric PenningtonTexas A&M


Stefan DombekLeipzig University
Scott PerryUniversity of Chicago

Jennifer EustisU. Massachusetts Amherst / Five College
Natalya PikulikCornell University
xAlissa HafeleStanford University
Vandana ShahCornell University
xJeanette KalchikStanford University
Amelia SuttonU. Massachusetts
xKevin KishimotoStanford University
Simona TabacaruTexas A&M

Ingolf KusshbzxHuey-Ning Tan

Stanford University


Kim LaineCornell UniversityxIrina TrapidoStanford University
xJoanne LearyCornell University
Kevin WalkerThe University of Alabama

Eliana LimaFenway Library OrganizationxAngela ZossDuke University

Eric LuhrsLehigh University


Visitor:

Mark Arnold, Missouri State University

Absent


Discussion Items

Item

Who

Notes

Attendance & NotesAngela

Attendance & Notes

  • Today's attendance-taker: ?
  • Today's note-takers:  Team Leads for project updates
LDLite demonstrationMark Arnold

LDLite

https://github.com/library-data-platform/ldlite

  • Missouri State first US institution to go live with FOLIO
  • had reports that were generated in previous system that library staff wanted to retain
  • info not really available via API
  • setting up LDLite was something we wanted to do anyway, gave us a chance to do other things 
  • are an EBSCO-hosted library, so don't have access to LDP
  • if we wanted to do anything using SQL, LDLite was only option
  • don't mind doing stuff with the APIs, enjoy that, but really like SQL
  • having things in an SQL format gives us some additional options we wouldn't normally have
  • using the variation Nassib mentioned - MSU is taking data from LDLite and putting it into a Postgres database
  • 1 database - complete harvest of all data; not updated often; have 1.4 million bib records, full harvest takes about 33 hours (VM with 4 processors 8 GB RAM, 300 GB storage), including SRS (15 hours just by themselves)
  • set that up first just to see what it was like
  • soon after, librarian needed to do some stats, didn't want them to have all the data, so created a smaller database and only harvested the data they need (collections, circulation) - harvest took about 9 hours
  • have been around libraries since 1993; back in mid-'90s, if you wanted to do a free scripting language, PHP and mySQL was the thing. don't have background in Postgres or Python, but have been able to figure it out.
  • have some scripts that run the harvest - start it end of day, let it run overnight. 
  • working on something else, with help from Nassib - want to do incremental updates. want to run a script that only brings changes since the last run of the script. hopefully run in the morning and get the data for working in the afternoon.
  • like the idea of using multiple databases to control access - just don't even include sensitive data in certain databases
  • script using is a variation of the script Nassib has on the GitHub page
  • have added the call into the database
  • have also created a text output; says when it starts and ends, also writes the endpoint and the tables it wrote for that endpoint; could also have it print table structure
  • have DBeaver connected to the VM for running queries; VM is ubuntu server, do everything commandline, so wanted DBeaver for an interface for queries
  • Questions: institutions accessing legacy data through LDP
    • Cornell has saved off data in a Postgres database, hope to bring it into the LDP; have pulled data out of legacy, do have a bunch that has been pulled into the local schema 
    • have been pulling requests data from Ares into the LDP to blend with FOLIO
    • have been talking about getting the data from the Voyager archive and using that; would really love that
    • have been talking about getting bits and pieces of the Voyager data separately (it's static, so just have to do it once)
    • getting the data from the legacy system into a format that will work is a big project; would have been nice to bring it in just before implementation
    • during migration, some things may have accidentally gotten skipped, so good to have that backup, would be easier if it was already in LDP
    • hard to query out the circ data and then bring it into the LDP
    • would be easier if it were all there
    • Stanford is likely to go straight to Metadb, which will eventually have a concept of multiple input sources; FOLIO would be one input source, but you could possibly set up the legacy ILS as a second input source, and Metadb could pull everything automatically (or a defined subset); and then if you needed to run both systems at the same time, it could continue to pull data from the legacy system
  • any lessons learned for implementation?
    • not terribly difficult, just install LDLite (one dependency, maybe); had more trouble getting Postgres installed and configured than LDLite
    • originally tried just using LDLite without database, but not great for harvesting absolutely everything; if you are going to use it for a lot of data, recommend go ahead and install Postgres and connect LDLite to that
    • that means you'll have to learn Postgres, at least a little
    • have gotten everything done with Google
    • If you're using LDLite in place of an LDP or Metadb instance, probably makes sense to have a shared database that would mimic an LDP, and use a regular scheduled to have LDLite update the data; probably works better for a smaller library where you can download the data regularly; a larger university probably needs the full LDP and Metadb
    • if Mark can get the incremental updates working, the plan is to write up the script and schedule it to run every night, which would give the users access to yesterday's data just like LDP users
  • what kind of documentation is there? you talk about having different databases that cover different things, how do you share that with users?
    • there's only two people using it right now at MSU, so not too many people who need to know; just shared the database connection details and he went ahead with DBeaver; he doesn't know about the other database he doesn't have access to
    • we're a small library - 45 employees, 3 IT staff, Mark is only one doing FOLIO stuff. documentation is just for him.
    • documentation for other universities
    • Note: we hope to have LDLite page up for the Juniper version of the FOLIO docs soon: https://juniper.docs.folio.org/docs/reporting/
  • Mark is happy to share the script, just message him on Slack

Announcements /
Reminders

Angela

Introductions of everyone


Questions

  • Entity-Relationship Diagrams
    • Latest version we have for LDP: https://glintcore.net:8443/ldp/schemaspy/public/relationships.html
    • could generate it specifically for Cornell
    • running it against the FOLIO sample data has been problematic; the sample data has duplicated identifiers, which prevents us from tracking with tables connect to each other
    • so, need to use this approach with real database rather than something like FOLIO snapshot
    • for Metadb there is a new version that works a bit differently
    • Nassib willing to talk to Cornell about trying to help with that
    • still trying to figure out relationships between tables in the various areas
    • hoping to start creating diagrams for derived tables
    • would also be great to have for each functional area/query subteam


FOLIO Analytics repository pre-release!


Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.


Updates and Query Demonstrations from Various Reporting Related Groups and EffortsCommunity & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.

RA/UM Working Group


MM Working Group

  • Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata.
  • We have submitted our slate of derived tables for Metadb for Lotus R1 2022. We completed 16 derived tables for metadb!
  • We are turning our attention to finish porting over derived tables to metadb. We have a sign up sheet. We are looking for reviewers.
  • In next our lab session, we will use the LDP instance identifiers and go through the process of porting this over to metadb, how to contribute a pull request with the changes and update the runlist.txt, and go through a review.


ERM Working Group

  • Fixed data types for foreign keys: #554, #551, #550
  • Will now refocus on documentation and new derived table developement
  • ERM Prototype and Query Development Status
  • Meetings are bi-weekly on tuesdays 11am ET alternating with RM Working Group
    • Next meeting will be at 15th, March
    • Contact Axel Dörrer if you would like to get a calendar invitation.
  • Mermaid diagrams can now display directly in GitHub markdown! Example


RM Working Group


Reporting SIG Documentation Subgroup

  • Honeysuckle documentation is live on https://docs.folio.org/docs/
  • Iris documentation is in progress, due December 15
  • Additional Context
    • The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub)


External Statistics Working Group

  • no updates currently
  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
  • New organizational structure for External Statistics reports
    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters


Product Council



For all recent work on FOLIO Reporting SQL development:


Demonstration of GitHub Desktop (probably next meeting)Angela
Topics for Future MeetingsAll
  • Next: Reviewing a pull request 
  • How to deal with External Stats reports?
    • maybe subteam leads check in about that
    • probably wait until after Metadb conversion is more complete
  • Definitely do GitHub Desktop demonstration at a SIG meeting in future, probably about 15 minutes
  • Query optimization by Nassib: Mar 28 and Apr 7


Review and update Topics for Future Reporting SIG Meetings 





  • A test Action Item (Ingolf)