2022-11-29 Discovery Integration Subgroup Meeting notes

Date

EDT 10:00am to 11:00am


Goals

Discussion items

Time

Item

Who

Notes

10:00Start of the meeting

10:05

requests currently not covered by an API for discovery systems (e.g. trending items)


See Brainstorm page for the diverse interface
10:15

start of discussion about data export (OAI-PMH)


Update from Magda via Slack:

Hi all, I would like to follow up on the OAI-PMH discussion from the last week meeting.Harvesting holdings and items data
FOLIO's OAI-PMH does support two metadataPrefixes:

  • marc21 - harvests only SRS MARC records
  • marc21_with holdings  - enriches SRS record with holdings and items fields as described in MODOAIPMH-102


Including inventory data
The Orchid release will include harvesting also records that do not have underlying source record - the work is covered in UXPROD-2404

Stabilization issues:
OAI-PMH indeed had a rough start and required some additional stabilization work.  However,  starting with the Kiwi release we regularly  harvest bugfest data (~8 M records in approximately 11 hours).   The issues we still see are related to multiple full harvest run concurrently (not supported), or bad data in inventory.   We significantly improved monitoring of the harvest that can be done by API calls as described in: https://github.com/folio-org/mod-oai-pmh#harvesting-statistics-api

You can also find more information about FOLIO's implementation of OAI-PMH  in: https://folio-org.atlassian.net/wiki/display/FOLIOtips/OAI-PMH+Best+Practices

Unfortunately, I won't be able to attend this week meeting either but I will listen to the recording and respond here.  Also, I should be able to attend the meeting on December 6th if you think that would be helpful.


Discussion:

Villanova has used OAI-PMH to index a couple million records in their test system. Some records get dropped, e.g. due to bad leaders or illegal control characters that cannot occur in XML, so important to watch the export counts and errors. (can usually deal with by fixing the records)

Noted the stabilization work mentioned above.

Is there a real use case for multiple full harvests to different systems simultaneously?

Currently we have Five Colleges, which are on a multi-tenant system, but all have EDS, so EBSCO can apply the full export to all. (Currently harvest about 5.2 million in 11 hours, down from 36 hours over the summer.)

Possible use case where you do not control the schedule of all systems which harvest from you, the system doesn't handle that so well right now.

[Magda Zacharska]: This should be addressed in UXPROD-3772 Implement Retry-after property for OAI-PMH response

One solution might be to harvest to an intermediary system like VuFind and then do the multiple harvests from that.

Q: does the FOLIO OAI-PMH support multiple formats, i.e. only MARCXML, or just Dublin Core? (We think just Dublin Core.)

A: [Magda Zacharska] FOLIO supports both: MARCXML and Dublin Core

Q: is there a need to support more formats or more verbs? Currently supports "get Record" but does not respect format (see above), which complicates troubleshooting.

A: [Magda Zacharska] FOLIO supports:  

  • GetRecord – Used to retrieve an individual metadata record.
  • Identify – Used to retrieve repository information (ex. name, version).
  • ListIdentifiers – Used to retrieve only headers.
  • ListMetadataFormats – Used to retrieve the available metadata formats.
  • ListRecords – Used to retrieve actual item metadata records

ListSets – Used to retrieve the set structure of a repository will be implemented in UXPROD-2439 (currently planned for the Poppy release)

The GetRecord issue not supporting marc21_withholdings metadataPrefix (MODOAIPMH-426) bug has been resolved in Morning Glory release.

One issue is that edge module hits inventory every time it needs to build a list, that is the performance Achilles heel.

10:40How to move forward with these?
  • Divide into sets where subgroups could meet so not everyone needs to look at all issues.
  • Triage: what is most important? what could be done today? what could wait?
  • Could there be a subgroup that does discovery not using OAI-PMH?

Ideas Decision (draft):

  • Identify different subgroups/individuals for the different areas and then triage within each area.
  • Add a column to the Brainstorm pages to sign up to triage individual issues.
  • Use an upcoming meeting time for subgroups to meet.














11:00End of the meeting