FOLIO Face-to-Face DC Conference Reporting Session Notes

Date

Attendees

NOTES

Purpose of establishing naming conventions:

  • organization should be simple, clear, and logical
  • need to be useful for end users
  • organization of reports, queries should be logical and simple as possible; library technology right now has incredible complexity, explodes in every corner; would like to organize things so it's clear and simple for people coming in
  • what we're really doing is taxonomy, which is both organization and naming


Example: Circulation Item Detail Report

  • there are two main types of circulation reports: aggregate counts, transaction details
  • "circulation" usually refers physical items, and usage usually refers to electronic 


Elements relevant to naming:

ElementValue for Example ReportDescription
Requirements IDID401
JIRA TicketREP-103
Name of Report
  • Circulation Transaction Detail
  • Circulation Transactions
  • Loans with Material Type
  • Circulation Transactions with Bibliographic Detail
  • Loans with Material Type and Bibliographic Detail
  • Loans with Item Detail

The report itself is focused on how the report is going to be used, not about how to get the data.

This will also be the name of the JIRA issue.

Report Naming Construct

  • by = filter or focus; appropriate for reports that aggregate transactions, not for transaction-level reports
  • with = what the report includes, details
  • only include as many extra clauses as is necessary to distinguish this report; if "bibliographic detail" is expected, don't need to list it

Note: we should create a markdown file for each report that would live in a reports folder on GitHub; it should include original purpose of the report, description of the different queries, links to queries

Can also link to report markdown files from the main ldp-analytics markdown file

Question: how to keep track of work on things like this in JIRA

Name(s) of query(ies)ex.: Count Loans for Locations

The query is how to get the data. Possible naming convention for short names:

verb, source data, function
example: count loans for locations

or maybe noun, verb, function

example: Loans - Count by Location with Material Type

Will likely create JIRA issues for all queries, maybe in LDP project; then can link from LDP query issue to report(s) that contain(?) that query

Idea from Angela:

  • for noun, what is each row in the resulting table? is it a loan? a general circ transaction? a patron?
  • for verb, either count or list?
  • for modifiers, focus on with and by? only add ones that change between versions of queries?
Short name(s) of query(ies)ex.: count_loans_locations
  • needs to be machine-readable, usable as a variable
    • underscores are better than dashes because some programs don't like dashes in variable names
  • may or may not need to look like the regular name; might need to be shorter
  • use as GitHub folder name
  • if adding something to indicate aggregation (_counts, _detail), add to end so you can sort
  • like with short name, maybe use verb after nouns so things can be sorted by main topic of interest
Umbrella category for reportCirculation

Group reports by the 5 most used apps of FOLIO that reports will be generated for:

  • users
  • circulation (check out, check in, requests)
  • inventory
  • finance and orders
  • resource management

just use umbrella category to organize reports, not to name them (could be the directory structure in LDP GitHub)

use tagging (e.g., in GitHub?) to help organize reports, maybe to add original functional area to query

Additional information that would be useful when browsing reports in, e.g., GitHub:

  • date of query
  • tag indicating original functional area



Report Name Discussion:

Question: do we really need to specific "bibliographic detail" in the report name? Is it reasonable to assume bibliographic data will be included?

  • when is something like that assumed to be in a report, and when does it need to be called out in the name?
  • can we just make a general "Loans" report with all of the columns and let people discard columns they don't need later?
    • end users would probably be happier with more specific subset reports already created for them; they will want to see both "Loans with Material Type" and "Loans with Service Point", even if at their core they are similar and we could hypothetically create a master "Loans" query with all necessary columns
    • going forward, maybe that's how we organize our work on report clusters: create master queries will a large set of columns needed across several reports, then develop queries that build off that query to tailor the results for specific report requests (see workflow below)


Upper-level grouping options:

  • Circulation (use App names/API structure; users, circulation, inventory, finance and orders, resource management)
    • maybe this is most readable for non-expert analysts
  • Resource Access (stick with current functional areas)
    • useful for report development, but maybe not useful for UX; could potentially just include tags on GitHub


General workflow for reports:

  • cluster reports by topic/purpose/common data elements
  • figure out the query that includes all necessary data elements (full, larger report with all the difficult joins)
  • build additional queries that filter/subset that full query as specified


Filtering Questions and Names of Reports and Queries

  • only include columns for which there is a one to one join
  • Where vs Select Filtering
    • Query = join loans
    • Query = filter material types


Linking between JIRA and future report documentation (GitHub):

  • Issue title should change to the new naming convention
  • don't try to create new JIRA projects for the umbrella categories; all dwreport issues stay in REP 
  • to filter down to reporting umbrella categories, could create a new field or just add more tags. Solution TBD.


Special cases:

  • external reports (e.g., ARL) - the "report" will be a collected set of queries required for external statistics; each of the queries, however, might be useful for other reports
    • Question: how does that work with GitHub? If we update the query in one place, do we just have to remember to update it everywhere else it might be used? 


Resource Access Visit


Topics

  • how can reporting step in for workflow, which won't cover everything RA needs in first iteration
    • example: items that need to change state after certain amount of time
    • RA will need to sit down and identify things that it was assumed the workflow engine would take care of
    • is any of it reporting functionality? may be, but some may just be normal reports
    • maybe set up a meeting with RA SIG, small group?
    • RA will probably just work on it together first
    • this may be part of a great use case for additional development resources
    • will have to do some analysis of in-app vs. LDP; only have resources to work on LDP (check out the new LDP vs. In-App explanation)
  • In-App vs. Data Warehouse
    • changing UXPROD-933 to REP, is a partial duplicate of UXPROD-1327, so no need for two in-app reports and makes sense to be able to run this kind of report against historical data
    • for data privacy, reports with patron identifiable data may need to stay in-app; includes things like fees/fines?

building report prototypes is like a mycelial network: you build one report, but then can tailor different branches

Demo of schemaspy, GitHub, test data

Resource Access has additional report requirements


App Interaction: ERM + eUsage + Reporting

The eUsage app would like to include a statistics summary that would require joining data between ERM and eUsage modules. Rather than having the app know how to join its data to another app, should they try to leverage the joins that the LDP is already doing? How might that work?

  • maybe app would connect to LDP through ODBC and submit query? or maybe we would offer an LDP API?
  • problem: they would have to be okay with day-old data, wouldn't be able to harvest COUNTER data and immediately see cost/use
    • this does seem like a concern, people might want pretty fast responses, especially if they're using this stats summary to perform QA on the data they just uploaded
      • (Q: could they do QA on the COUNTER data separate in the app, before it gets joined to cost data?)
    • don't forget, day-old is just for go-live, may eventually be faster
  • this should really be a conversation across apps:
    • other apps might (will) need to join data, too
    • if other apps are joining data directly and don't need the LDP, maybe we shouldn't do something special for ERM; but maybe everyone will want to run from the LDP

Note: this meeting also included a tangential but important discussion about non-COUNTER data in the eUsage app. There are many difficulties getting non-COUNTER data to be usable in the app - it could be uploaded, but any calculations are hard to do unless you have the known structure of something like COUNTER. Talked about possibly adding a special uploader to match non-COUNTER data to important fields, but even that might be out of scope. Is there a chance the LDP would be a better option? If the LDP allows for custom data (could be a separate, local data table), then they could at least query against COUNTER and non-COUNTER data using the same database. That might not be perfect for the analysts, though, who would prefer to be able to quickly upload data in whatever format they have (JSON?) and be able to calculate cost/use easily.


Planning for Closing Session

  • We have 10 minutes to summarize what we've done
  • Angela will talk about naming conventions
  • Sharon will give an overview, talk about RA, a bit about eUsage
  • Scott will talk about ERM, RM
  • Should we cover eUsage? Reviewed the discussion yesterday, think it's possible to repurpose the LDP joins for cross-App issues, but not sure if we'll be offering anything custom

Data Dictionary 

  • can we automate data dictionary with info from the dev API documentation?
  • From Nassib: data dictionary is a vague concept, we should agree on what it means. What are we going to mean by "data dictionary"?
    • Option 1 ("Data Dictionary"): documentation of the attributes in the LDP (fairly common definition); an attribute is a column in a table in a relational database; "documentation" would include values of the fields, what they are, etc. Could write it from a librarian point of view. It's complete documentation - everything you need to know to be able to use it in queries. Could take the form of just filling in the comment field of the attributes in the database. Note: we want to do better than the FOLIO documentation, that is too minimal and our data is slightly different. Would be nice to include where the data came from. (Maybe people could help.)
    • Option 2 ("Data Dictionary + Glossary"): FOLIO needs dox, LDP needs dox, sort of covering the same attributes (pretty close). What if there was a third document - a data dictionary, kind of like a glossary of librarian terms. Term is the technical term for librarians, has definition, then LDP and FOLIO dox could point to the glossary. Counterpoint - librarians would not really agree on the definitions. But if we had definitions, we could use technical terms in the data dictionary without having to be too general. This would also mean FOLIO can share our documentation work.
  • One model of the elements we would need for the LDP documentation is  FOLIO-1551 - Getting issue details... STATUS . If Option 1, this would be the same as the data dictionary. If Option 2, LDP dox would follow this specification and refer to the glossary, which could be called the data dictionary. This information will definitely be in the comments, no matter what we call the data dictionary.
  • The main data dictionary issue is  UXPROD-1414 - Getting issue details... STATUS .

Resource Management Visit


In app reporting

Dennis began with discussion of a dashboard for finance.  Seems more appropriate for Folio as a whole.  Planning a summary widget for finance and acquisitions.  Widget will display based on search results.  It will calculate summary information based on search results.  Focus primarily focused on daily applications.  Cross app reports will use LDP.


See notes in In-App Reports - Dennis Bridges spreadsheet of RM reports


Evaluation of reports

Discussion about LDP not having real-time data.

Export based on search report.

Group record to display in current widget functionality

UXPROD-701?  Needs to be more defined.

Finance app has been delayed.  Trend information in LDP.

Do some reports in both the data warehouse and as an in-app report

Workflow engine

LDP for table join to external

Metadata Management Visit


Complex queries.  Transition of the data from apps to the LDP or LDP to apps is not defined.  Maybe workflow engine or an in app report can produce the results.  Agreement that the workflow engine should be the mechanism.

Clarification of LDP or Data warehouse.  Are the same thing.

Hoping for incorporation of reporting within sprint reviews and roadmap.

Some inapp reports originally identified have now moved to the LDP.

Need for development resources

Is the data model a remodeling of data?  Interpretation across two different places.  Using Star Schema to reorganize the data to make it easier. 

Concerns about transferring large amounts of data.  Now not possible.

Discussion about the structure and ability to create local tables in the LDP.  Some examples of local tables in existing systems.

Marc being converted to JSON (already being done elsewhere in FOLIO).  Specify nested data?  Reporting SIG is not looking at non-relational options for Marc.  There will be a manual mapping of some of the elements.

Review of reports

Review process of Reporting SIG to generate reports.  Reviewed SQL for a sample report Shelf-list Location.  Initial documentation to figure out which attributes are needed to create a data model. Reviewed the Star Schema for the LDP.

All tables will have a tenant ID.

Report on the process that makes changes–how do you report on operational activity.  In app report.

Action items

  •  add a user story to the JIRA issue about storing effective location in loan (UXPROD-1432)
  •  Sharon will follow up with Dennis and Claudius on the remaining reports not reviewed at F2F