2023-02-08 Meeting notes: Airflow for bibliographic workflows

Date

Housekeeping

Discussion items

  1. Airflow for "bibliographic workflows" | jpnelson at Stanford | until 12:30 PM ET
  2. AI SIG survey

es

Airflow for "bibliographic workflow"

  • open source project
  • widely used - often in big data or machine learning
  • used airflow for other workflows before adopting for FOLIO
  • for FOLIO used it for data migration - mainly to support bibliographic workflow
  • about 10 million MARC records from previous ILS
  • what does workflow mean in the context: airflow construct a workflow as discrete tasks
  • consists of DAGs | DAG: Directed Acyclic Graph (one direction = non-looping code) which complete a single specific task
  • tasks are grouped into task groups
  • instances are posted before holdings and then items
  • when finished → running a separate process → check whether all records are went into FOLIO
  • Owen in chat: So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
    • Jeremy: kind of, a DAG is a separate workflow, different DAGs are connected for migration
  • Charlotte in chat: Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
    • follows - min. ~22
  • Owen in chat: Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
    • Jeremy: yes
  • Charlotte in chat: Do you have a link to the GitHub documentation?
  • Martina S: what can we do with the gained knowledge?
    • Owen: some similarity between some of the tools
    • describe different scenarios and why the tools work seems important
    • e.g. why is a dataflow tool like airflow useful for data loading
    • what are the differences and what are the similarities
  • building structured data flows vs. using an external tool to integrate with FOLIO (request demo)
  • Maura: different tools for different kinds of processes
    • presentations and knowledges give a lot to take back to institutions
  • Owen: can we make the solutions more adoptable to the community?
    • do we need to work on making integration easier
    • do we need SIGs for the different tools or a workflow SIG?
  • Maura: would be good for more libraries to get the benefit of the presentations
    • maybe have a Slack channel wehre people can share tools and experiences
    • we should avoid re-inveting the wheel again and again
  • Martina S. should we ask Duke to present on workflow management with Trello 
    • yes, or other institutions using Trello
    • or MS Teams

AI SIG survey

  • to improve the overall experience I would ask you to answer the linked survey, that will only take a few minutes
  • the survey will be open until Feb 22nd

Chat

18:13:00 Von  Owen Stephens  an  Alle:
    So have I understood this correctly: the “workflow” is a chain of “DAGs” and the DAGs made up of non-looping code which complete a single specific task?
18:18:50 Von  Charlotte Whitt  an  Alle:
    Jeremy, is this work flow only intended for migration, or are you planning to extend it to also be used in production, maybe as a Data Import work flow?
18:19:40 Von  Owen Stephens  an  Alle:
    +1 to that question Charlotte
18:23:01 Von  Owen Stephens  an  Alle:
    Could you see this being used to bridge application within Folio? (e.g. if X happens in module Y, do A in module B?)
18:25:26 Von  Owen Stephens  an  Alle:
    Thanks Jeremy
18:25:40 Von  Maura Byrne  an  Alle:
    This is great.
18:25:53 Von  Charlotte Whitt  an  Alle:
    It looks really amazing
18:26:48 Von  Charlotte Whitt  an  Alle:
    Do you have a link to the GitHub documentation?
18:27:55 Von  Jeremy Nelson  an  Alle:
    https://github.com/sul-dlss/libsys-airflow
18:28:06 Von  Charlotte Whitt  an  Alle:
    Thanks a lot
18:28:46 Von  Owen Stephens  an  Alle:
    It seems not completely dissimilar to the demonstration from last time
18:29:14 Von  Owen Stephens  an  Alle:
    I’m just trying to remember what that was called :O
18:29:54 Von  Owen Stephens  an  Alle:
    Thanks Martina - yes it seems similar to Prefect
18:30:31 Von  Kristin Martin  an  Alle:
    https://folio-org.atlassian.net/wiki/display/REL/Team+vs+module+responsibility+matrix
18:45:10 Von  Owen Stephens  an  Alle:
    100% on avoiding re-inventing the wheel
18:47:28 Von  Kristin Martin  an  Alle:
    Or MS Teams
18:48:05 Von  Owen Stephens  an  Alle:
    The “learning apis” slack channel has questions that come up in similar way, but I like the idea of an “Automation approaches” channel
18:48:18 Von  Maura Byrne  an  Alle:
    +1
18:48:52 Von  Owen Stephens  an  Alle:
    In fact some of the questions that come up in Learning APIs are absolutely automation questions
18:49:28 Von  Owen Stephens  an  Alle:
    The discussion happening in there today is an example

Transcript

Future topics

  • Topic proposal by Owen Stephens for October:
    • Use of shortcut keys and macros for more effective cross-app working  - it also be good to have UX and Stripes/dev knowledge for this discussion I think. I know @Laura (she/they) uses macros so might have insights into the potential for cross-app working
    • Potential for external 'workflow' solutions for cross-app interactions
      • I think 'workflow' is a dangerous term here - in this context it's more about automation than user workflows, although I think there is overlap
      • I was particularly struck by the solution in production at TAMU (Jeremy Huff and Sebastian Hammer presented, the recording is at https://prod-zoom-recordings-openlibraryfoundation-org.s3.amazonaws.com/50dc6c87-3912-43fa-8287-56ec73b12bbb%2Fshared_screen_with_speaker_view%28CC%29.mp4 starting at 3 hrs, 14 min) - I think getting someone from TAMU to talk about how this is used would be v interesting (tick)
      • There was also a presentation on the use of a tool called Airflow at Stanford for "bibliographic workflow" but I've not watched that yet so not 100% sure if it is completely applicable - I think the core use case there was systems migration but it may go beyond that (tick)
      • Jenn Colt on using Prefect (tick)
      • does not need to be workflow across apps
  • UX/UI and implementers topics
    • should be Wednesdays
  • Comprehensive look at where data is copied and stored as opposed to live data | how it is represented
  • Date filters and how they work in different apps

Attendees

Present

Name

Home Organization


Brooks Travis

EBSCO

x

Charlotte Whitt

Index Data


Dennis Bridges

EBSCO

xDung-Lan ChenSkidmore College

Erin NettifeeDuke

Gill Osguthorpe

UX/UI Designer - K-Int


Heather McMillan Thoele

TAMU


Ian Ibbotson

Developer Lead - K-Int


Jag GorayaK-Int
x

Jana Freytag

VZG, Göttingen


Jenn ColtCornell

Khalilah Gambrell

EBSCO


Kimberly PamplinTAMU

Kirstin Kemner-Heek  

VZG, Göttingen


Kristin Martin

Chicago


Laura Daniels

Cornell


Lloyd Chittenden

Marmot Library Network


Marc JohnsonK-Int
x

Martina Schildt

VZG, Göttingen

x

Martina Tumulla

hbz, Cologne

x

Maura Byrne

Chicago


Mike Gorrell

Index Data

x

Owen Stephens

Product Owner -  Owen Stephens Consulting


Patty Wanninger

Product owner Users app


Rachel A SneedTAMU

Sara ColglazierFive Colleges / Mount Holyoke College Library
xSusanne SchusterBSZ Konstanz

John CoburnEBSCO

Zak BurkeEBSCO

Daniel HuangLehigh
xMaccabee LevineLehigh

Robert ScheierHoly Cross
xJeremy NelsonStanford
xIngolf Kusshbz

Action items

  •