WOLFcon early bird registration open until June 18th. Call for proposals is open until June 5th; aiming to have an agenda ready before the early bird registration closes. Draft proposals are okay; they can be revised later.

FOLIO Council election results should be coming out soon after the Community Council meeting on Monday.

Data Import: Current State to Desired State

Background documents:

ARLEF Data Import Report, 2023-04-12

EBSCO's Report to the ARLEF Report, 2023-04-24

Report on Data Import, by Corrie HutchisonHutchinson, 2023-05-05

EBSCO's Update on Data Import Troubleshooting, 2023-05-23

From Data Import/MM:

Desired outcomes:

  • Shared understanding of severe shortcomings of data import
  • Shared understanding of next steps for addressing data import
  • Shared understanding of timeline for improvement
  • Determination of how data import remediation will be prioritized

This comes out of various documents and the meeting in Stanford, and includes two reports from EBSCO.

Problem statement: we don't have a reliable and performant solution for loading records to support daily workflows and monthly/quarterly workflows. (Leaving aside system migrations for the moment.) Record loads of less than 1000 records can timeout. Libraries are unable to manage basic workflows such as loading electronic records. FOLIO users must plan to work off-hours to complete import jobs.

Slide 6 describes in-progress efforts across these topics: architecture, infrastructure, development, and product management.

"Large" loads are about 100,000 records, and there need to be plans to address loads of that size and much smaller sizes. Also noting that some types of loads are more performant than other types, so it isn't necessarily about the number of records. "Chunking" can also allow for smaller loads to run interspersed with a larger load. (Single record imports jump the queue right now.)

In addition to the documentation prepared, interviews with current users of FOLIO help make the issues more understandable. (There is a governance question of who "customer" or user is, and if it is inclusive of all hosting providers and those not using a hosting provider.)

Performance is one issue, as is the capabilities of the logic that is a part of data import.

Questions from the meeting include how bugs are prioritized and whether there can be a dashboard of work that is happening on Data Import. (A dashboard is in the works.) Libraries need to be able to know when the functionality will be implemented so they can plan for their local workflows, including when functionality is deemed out-of-scope.

Short term: complete work in progress and address critical production needs (continue performance improvements and reliability/scalability improvements). Collaborate with other hosting providers and self-hosted institutions. Provide realistic performance benchmarks. Define an approach for addressing failed records and logging issues.

Mid-term: review the data import roadmap; continue architecture and infrastructure improvements; development to address failed records and logging issues.

Longer-term (12 months): continue architecture, infrastructure, and development improvements

Proposal for release changes: extend the Poppy release to the current Q-release date (November 2023). There is functionality (other than data import) that is not ready that is driving this decision.

Proposal for combined Poppy/Quesnelia release

See notes from Release Management Stakeholders

Further discussion to happen over Slack, and aiming for a decision to be made on Monday.
00:14:04    Christopher Spalding (EBSCO):    Kristin, we'll send a link to this deck so it can be added to the documentation.
00:31:21    Charlotte Whitt:    Those P1/P2 bug tickets are they all tagged with the label `support` - and monitored by the Support SIG?
00:31:48    Erin Nettifee:    Related - is there a jira dashboard for this work in progress that could be shared?
00:37:29    Felix Hemme:    Why can't multiple jobs run in parallel?
00:38:44    Jenn Colt:    We have provided such stats repeatedly. The DI wiki page has suggested standards with large goal = 100k
00:40:50    Erin Nettifee:    Is there a document anywhere that lists the requirements for data import and what has been implemented and what hasn't? I'm aware of but it is so high-level, and where the issues really come out is in the nitty gritty of what data import needs to be able to do.
00:41:37    Erin Nettifee:    e.g., "Update SRS MARC BIB via Data Import" is so general, and doesn't help people understand what mapping is possible, what functions are supported under that umbrella, etc.
00:41:40    Christie Thomas (she/her):    And large during the work day / peak hours is different from large during off hours.
00:41:56    Erin Nettifee:    I think we have a very weak understanding across the board of what the app is actually supposed to be doing.
00:42:42    Erin Nettifee:    I helped with documentation discussions around DI, and there are so many instances of "Wait, I thought that was implemented?" … concrete requirements that could be referenced would be so helpful. Even if we have to go back and drag through years of development work to articulate it.
00:43:00    Tod Olson:    So "large" is a function of both record set size and complexity of import profile?
00:43:25    Jenn Colt:    Replying to "I helped with docume..."

Part of that is because these discussions start over and over with information degradation.
00:43:35    Jenn Colt:    Replying to "I helped with docume..."

Fractal telephone
00:50:40    Charlotte Whitt:    The paper Corrie Hutchinson has written up, reflects customers expectations.
00:51:15    Charlotte Whitt:    Will these customer interviews/feedback build upon this document, or start the process all over again
00:52:22    Thomas Trutt:    I would suggest starting over, clean slate of expectations, and see how the current system address those needs and where the gaps are. Otherwise you get into the issue of this needs fixed instead of requirements.
00:52:47    Jenn Colt:    Can we please stop and define customers?
00:55:25    Erin Nettifee:    Replying to "I would suggest star..."

That is what is missing - the list of requirements that the app is supposed to meet. Without that, it is very confusing when someone is looking at a thing that doesn't work. You can't know where to start if you don't know if the app is supposed to be able to do the thing that didn't work.
00:57:48    Thomas Trutt:    But that flow logic sounds like a base requirement.
01:00:21    Jenn Colt:    Burn it down is always appealing in theory
01:01:28    Steph Buck:
01:01:54    Thomas Trutt:    Would it make sense to simplify and streamline DI and allow for plugins, that add functionality, instead of trying to put all the features into one container?
01:03:24    Owen Stephens:    I think it's fair to say that bug prioritisation is an art rather than a science. The PO (usually the PO at least) has to balance many factors when prioritising bugs and scheduling work
01:05:26    Anya:    Also please come to the support sig - every Monday
01:10:17    Jenn Colt:    Sharing the link would be wonderful
01:12:18    Lisa McColl:    Yes - nice to see the "Limited functionality" row.
01:13:16    Khalilah Gambrell (EBSCO):    MIRO -
01:15:17    Thomas Trutt:    If other institutions are using external tools is it worth while to look at what they are doing and adopt that?
01:16:42    Charlotte Whitt:    The German community uses mod-inventory-update
01:16:48    Owen Stephens:    Who is "we" in that context? I'm rather unclear what the PC can do beyond banging the drum about the issue?
01:16:54    Erin Nettifee:    But that doesn't support MARC, right Charlotte?
01:17:11    Khalilah Gambrell (EBSCO):    We = Community
01:17:42    Charlotte Whitt:
01:18:31    Charlotte Whitt:    Can support MARC and any format - but will require setting up a the data flow
01:19:42    Charlotte Whitt:    Reacted to "But that doesn't sup..." with ❓
01:20:29    Aaron Neslin:    +1 - I appreciate the steps that have been taken, but we've been pointing out these issues for literally years
01:21:39    Jenn Colt:    Understanding how you will know would be a great step. What is the discovery? And how is it different from the last two years of discovery?
01:24:46    Anya:    University of the Arts is now the newest library to go live on Full FOLIO
01:25:01    Jenn Colt:    I watched Picard over the last week and was reminded of DI: "Change always comes later than we think it should."
01:28:42    Owen Stephens:    This is a big decision to make so late in the release cycle
01:30:02    Julie Bickle:    +1 Owen
01:31:37    Martina Schildt | VZG:    Completely agree Owen
01:31:48    Thomas Trutt:    In general I think less releases per year is a good idea. The time spent on Bugfest and squashing those bugs could be put into more dev cycles.
01:33:36    Charlotte Whitt:    Some libraries have postponed upgrade to Orchid, and then planned to do Poppy
01:35:20    Owen Stephens:    I will also note that our development plans would have been very different if we were not planning to meet the poppy deadline
01:37:50    Charlotte Whitt:    Similar for the work we do for Mainz (Gutenberg) and Hebis (Odin)
01:39:50    Owen Stephens:    Who will make the final call and when?
01:40:00    Wayne Schneider:    @harry that is a little more complicated than you are suggesting due to dependencies in both FOLIO interfaces and UI libraries.
01:40:37    Owen Stephens:    And the new CSP does not allow for non P1/P2 to be addressed via service releases
01:40:48    Charlotte Whitt:    Maybe we could do a Poppy Service Patch - not just bugs, but real work too
