2023-01-19 Reporting SIG Meeting notes

Date

Attendees

 Present?

Name

Organization

Present?

Name

Organization

xArthur AguileraUniversity of Colorado, Boulder
Eric LuhrsLehigh University
xSharon BeltaineCornell UniversityxLinda MillerCornell University

Erin BlockUniversity of Colorado, Boulder
Nassib NassarIndex Data

Nancy BolducCornell UniversityxElena O'MalleyEmerson

Shannon BurkeTexas A&M
Tod OlsonUniversity of Chicago

Suzette CanedaStanford University
Jean PajerekCornell University

Lloyd ChittendenMarmot
Michael PatrickThe University of Alabama

Tim DannayMount Holyoke College
Eric PenningtonTexas A&M

Axel DoerrerUniversity MainzxScott PerryUniversity of Chicago


Shelley DoljackStanford UniversityxNatalya PikulikCornell University

Stefan DombekLeipzig University
Bob ScheierHoly Cross

Jennifer EustisU. Massachusetts Amherst / Five CollegexVandana ShahCornell University
xLynne ForsWellesley College
Linnea ShiehStanford University

Lisa FurubottenTexas A&M
Clare SpitzerStanford University
x  Alissa HafeleStanford University
Amelia SuttonU. Massachusetts
xKara HartWellesley College
Simona TabacaruTexas A&M
xJamie JesanisWellesley College
Huey-Ning Tan

Stanford University


Jeanette KalchikStanford University
Vitus TangStanford University

Kevin KishimotoStanford University
Irina TrapidoStanford University

Ingolf Kusshbz
Catherine TuohyEmmanuel College
xAlexander LaoStanford University
Kevin WalkerThe University of Alabama
xJoanne LearyCornell UniversityxAngela ZossDuke University
xEliana LimaFenway Library Organization


Discussion Items

Item

Who

Notes

Attendance & NotesAngela

Attendance & Notes

  • Today's attendance-taker: Linda (or substitute)
  • Today's note-takers:  Team Leads for project updates

Announcements /
Reminders

Angela

New: Join our Review Board

  • Great way to start learning to run LDP1 and Metadb queries, get comfortable with GitHub
  • No experience required! We will offer training.
  • Contact Sharon Beltaine or Angela Zoss (Old) if you have questions or would like to volunteer


Homework #1: Feedback on derived table documentation project

This summary from the project team gives some information about the state of the project and asks for feedback on two questions. We will have a discussion on these questions during the SIG meeting on Monday, January 23. Please review the document in advance. If you won’t be attending the Jan 23 meeting, feel free to share feedback in advance (via Slack, email, or comments on the document).


Homework #2: Reporting SIG member survey

Sharon and Angela would love to hear things we could do to improve your experience of the SIG. This survey is your chance to give us feedback! Please do take the survey, even if you are only somewhat active with the SIG. We would like responses by Friday, January 27.


Update on 2023 Goals:

  • Goals for 2023
  • After our Jan 5 meeting to discuss goals, we also discussed them on Monday, Jan 9
  • Overall there was agreement, but in the Monday meeting we added some rough categories to summarize the main areas for our goals
  • Many goals fall on the development teams, so please do consider joining a development team if you're not already active on one
  • Largely hoping to complete our in-progress projects, but a few additional priorities have been outlined


Any new members?

  • Welcome/introductions
  • Would anyone like a buddy? Like to be a buddy?


How to find our latest recordings


(Always) Recruiting New Query Developers

  • The Reporting SIG is always on the look-out for new query developers. Please let us know if you are interested in doing query development or if there are others at your institution who might be a good fit.


Tips on querying date fields
  • Slides: Working with Dates in SQL in the Library Data Platform

  • Basic date and time formats:
    • there are many formats for date and times, and you can use SQL to transform between different formats
    • date, interval, text/VARCHAR, timestamp, timestamptz
    • note about timezone - it will appear as an offset from UTC; so, -0500 is 5 hours behind UTC; Eastern US timezone flips between -0500 or -0400, depending on the time of the year and the daylight savings adjustments
  • Example of something you might want to do: create list of orders made on March 1, 2022
    • to be able to filter on a date, you might try to just type the date as text, but if the field is a timestamp, that won't necessarily work. there is extra time information in the field, and SQL isn't going to automatically extract just the simple day from the timestamp.
    • you can cast the timestamp to a date with ::DATE. 
  • Another task: filter on a range of dates
    • in our FOLIO Analytics report queries, we often use a common table expression (CTE, or a way of defining a set of values inside the query) at the top of the query to specify parameters, and those often include date ranges (one parameter for start date, another for end date)
    • then you can filter to records where the dates are greater than or equal to the start date and less than the end date (there is a special way to call the parameters in these queries, check slides for examples)
  • CURRENT_DATE is a function that returns the current date as a DATE; can include that as part of your SELECT statement if you want it to appear in your table of results
    • there is also a CURRENT_TIMESTAMP
  • NOW() returns a timestamptz
  • INTERVAL is something like week, day, month, etc.; e.g., INTERVAL '1 week'
    • can combine CURRENT_DATE with INTERVAL, e.g., CURRENT - INTERVAL '1 week' as start date and CURRENT_DATE as end date to get data from previous week
  • TO_CHAR
    • sorting or filtering dates in Excel might require dates that aren't "pretty"; might want to wait to format the date until Excel
    • if you do want to format in SQL, can use TO_CHAR, which will convert the date to something that looks nicer (e.g., mm/dd/yyyy)
    • But these can't be sorted as dates
    • This works with time components, too: need to use formatting codes like "mm/dd/yyyy hh:mi am"
    • if you're not sure if your date is a true date or just text that looks like a date, dates are aligned to the right in a column, and text is aligned to the left
    • If you want to use dates for calculations, sorting, or filtering, you probably need to keep them in their native format or cast them to a date or timestamp format
  • dates from JSON
    • If dates are in a data array, you have to pull them out from the JSON, and you will only get the date as text (the LDP1 software won't have done the date conversion for you); use casting to convert it to the format that matches both the data you're getting and also the final date you want
  • extracting date parts
    • this works on actual DATE/TIMESTAMP/TIMESTAMPTZ data, not just text
    • options include year, month, day, hour, minute, second, dow (this is day of week)
    • note that extract year may result in number formatted like 1,992; could convert those to VARCHAR if you don't need to do any additional calculation
    • can also get the names for the month and weekday
      • extract month number or day number or day of week number with DATE_PART
      • with TO_CHAR, can also specify a date part (e.g., "Mon", "Dy") to get the text version of the month or day
      • EXTRACT works like DATE_PART, just slightly different syntax; returns numbers
  • measuring length of date interval
    • AGE function can calculate days and times elapsed between two dates
      • start with end date, then start date for the age in positive numbers
    • Can also just subtract dates (easiest with simple dates)
    • Can also use INTERVAL to calculate a specific date based on a length of time relative to today (or another fixed date)
  • documentation for PostgreSQL date/time functions
    • sometimes PostgreSQL is different from other flavors of SQL, so maybe look specifically for postgres advice
Review of In-Progress Projects (Recurring)
Review the release notes for FOLIO Analytics, LDP1, LDLite, LDP Reporting App, ldpmarc, Metadb Projects (Recurring)
Updates and Query Demonstrations from Various Reporting Related Groups and Efforts Projects (Recurring)Community & Coordination, Reporting Subgroup Leads

Project updates

Reporting development is using small subgroups to address priorities and complete work on report queries.  Each week, these groups will share reports/queries with the Reporting SIG.  Reporting development team leads are encouraged to enter a summary of their work group activities below.

RA/UM Working Group


MM Working Group

  • Meetings are 1st Tuesday of the month, 12-1pm ET via zoom using the usual FOLIO password. Our lab sessions are open to everyone. Please bring your questions, examples, and comments about reporting and metadata. MM notes
  • We will be fixing the LDP1 derived table instance_ext.sql which has a duplicate field (record_source and instance_source). The column record_source will be removed and comments will be added to the derived table.
  • If you are able to test ldpmarc 1.6 new release, please do so.


ERM Working Group


RM ACQ Working Group


Reporting SIG Documentation Subgroup

  • Morning Glory documentation is live on https://docs.folio.org/docs/
  • Nolana documentation is in review
  • Orchid documentation will be in progress soon, and plans are underway to include beta-level documentation for Metadb
  • Additional Context
    • The Reporting SIG has representation on the Documentation Working Group, which is building end-user documentation for https://docs.folio.org/docs/ (mostly linking to existing documentation over on GitHub)


External Statistics Working Group

  • no updates currently
  • new organizational/tracking scheme for JIRA, with pointers to queries in folio-analytics repository
  • New organizational structure for External Statistics reports
    • external statistics reports (e.g., ACRL) typically require running queries from different functional reporting areas
    • these reports will be captured in JIRA under one UXPROD-XXXX report cluster issue, then the descriptions will point to each of the queries required to run them on the folio-analytics repository
    • institutions will need to rank each of these 8 new UXPROD-XXXX report cluster issues
    • each reporting development team will take responsibility for the queries in their area for the external statistics clusters


Product Council


For all recent work on FOLIO Reporting SQL development:


Topics for Future MeetingsAll
  • Jan 23: discuss derived table documentation homework
  • Feb 2 (tentative): Review Board Training
    • Identify a pull request that needs a review
    • Assign yourself to review a pull request
    • Reviewing a pull request using the check list
    • Communicating with a PR submitter
    • Differences between LDP1 and Metadb queries to look out for
  • Feb 13: review LDP1 Hosting Tips document
  • Exploring new recruitment/onboarding strategies (e.g., buddy system)
  • More work on asynchronous collaboration, how to engage in discussions and question answering more broadly
    • consider connecting discuss.folio.org with a Slack channel, to make sure any forum topics get highlighted on Slack as well?
  • Open question: should we update FOLIO LDP1-based Reporting First Implementers Grid
  • Cleaning up our wiki pages
  • talk about preparation for Metadb: do institutions have questions? is there anything we should be working on to try to support them?


Review and update Topics for Future Reporting SIG Meetings 





  • A test Action Item (Ingolf)