Spike: MODDATAIMP-361 Investigation: Import MARC Authority records

MODDATAIMP-361 - Getting issue details... STATUS

Goal and requirements

The goals of importing MARC Authority records into Source Record Storage (SRS) are:

  1. Create new MARC authority records
  2. Update/Modify existing MARC authority records using matching/mapping profiles
  3. Search imported MARC authority records by means of SRS MARC Query API

Requirements

  1. Supported file extensions/formats: All files with Data type = MARC
  2. Honor MARC Field protection entries
  3. Support mapping profiles to populate authority records
  4. Support match profiles that allow matching MARC authority records
  5. Support job profiles to execute import requests
  6. Import actions to support:
    1. Create record
    2. Update entire record
    3. Modify record (not for initial release)
    4. Delete record (not for initial release)
  7. When a user imports then
    1. generate a HRID
    2. generate UUID
    3. store imported authority records in SRS

Creation of MARC authority records

Creation of new MARC authority records should be similar to the process of creating new MARC Bib records in the sense that both of these records share the same format which can be accepted by SRM/SRS. Any extensions to the process that customize MARC Bib record creation are done either by external modules (like mod-inventory) or hidden behind general interfaces (like EventProcessor). Having that in mind it can be assumed that MARC Authority records creation should follow the same pattern as for MARC Bib. So let's revise a very high level view of the creation flow.

High level description of existing MARC Bib records creation flow

  1. SRM receives request to process a batch of MARC records in raw format (Json/Marc/XML)
    • entry point: EventDrivenChunkProcessingServiceImpl
    • existing job execution entry initialized if necessary and its status is set to PARSING_IN_PROGRESS
    • incoming records parsed from raw format and kept in the objects of Record type with other supplemental data

    • DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED event is sent to notify that raw records have been parsed
    • journal records created for the parsed records

2. SRS retrieves parsed records from the appropriate event topic and saves them to MARC records database

    • entry point: ParsedMarcChunksKafkaHandler
    • parsed records transformed into DB representation and saved
    • raw records saved
    • initial generation created
    • DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED event is sent to notify that new records have been saved

3. SRM gets saved records from the event topic and notifies other consumers that the records are created

    • entry point: StoredMarcChunksKafkaHandler
    • DI_SRS_MARC_BIB_RECORD_CREATED event is sent for each parsed record with the following payload

context property contains:

      • encoded Record object mapped to MARC_BIBLIOGRAPHIC entity type
Sample Record with MARC Bib
{
	"id":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0",
	"snapshotId":"06efa5b0-6d59-41bc-8207-801df6fbf22f",
	"matchedId":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0",
	"generation":0,
	"recordType":"MARC",
	"rawRecord":{
		"id":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0",
		"content":"01240cas a2200397 4500001000700000005001700007008004100024010001700065022001400082035002600096035002200122035001100144035001900155040004400174050001500218082001100233222004200244245004300286260004700329265003800376300001500414310002200429321002500451362002300476570002900499650003300528650004500561655004200606700004500648853001800693863002300711902001600734905002100750948003700771950003400808\u001e366832\u001e20141106221425.0\u001e750907c19509999enkqr p 0 a0eng d\u001e \u001fa 58020553 \u001e \u001fa0022-0469\u001e \u001fa(CStRLIN)NYCX1604275S\u001e \u001fa(NIC)notisABP6388\u001e \u001fa366832\u001e \u001fa(OCoLC)1604275\u001e \u001fdCtY\u001fdMBTI\u001fdCtY\u001fdMBTI\u001fdNIC\u001fdCStRLIN\u001fdNIC\u001e0 \u001faBR140\u001fb.J6\u001e \u001fa270.05\u001e04\u001faThe Journal of ecclesiastical history\u001e04\u001faThe Journal of ecclesiastical history.\u001e \u001faLondon,\u001fbCambridge University Press [etc.]\u001e \u001fa32 East 57th St., New York, 10022\u001e \u001fav.\u001fb25 cm.\u001e \u001faQuarterly,\u001fb1970-\u001e \u001faSemiannual,\u001fb1950-69\u001e0 \u001fav. 1- Apr. 1950-\u001e \u001faEditor: C. W. Dugmore.\u001e 0\u001faChurch history\u001fxPeriodicals.\u001e 7\u001faChurch history\u001f2fast\u001f0(OCoLC)fst00860740\u001e 7\u001faPeriodicals\u001f2fast\u001f0(OCoLC)fst01411641\u001e1 \u001faDugmore, C. W.\u001fq(Clifford William),\u001feed.\u001e03\u001f81\u001fav.\u001fi(year)\u001e40\u001f81\u001fa1-49\u001fi1950-1998\u001e \u001fapfnd\u001fbLintz\u001e \u001fa19890510120000.0\u001e2 \u001fa20141106\u001fbm\u001fdbatch\u001felts\u001fxaddfast\u001e \u001flOLIN\u001faBR140\u001fb.J86\u001fh01/01/01 N\u001e\u001d"
	},
	"parsedRecord":{
		"id":"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0",
		"content":"{\"leader\":\"01338cas a2200409 4500\",\"fields\":[{\"001\":\"in00000000001\"},{\"008\":\"750907c19509999enkqr p 0 a0eng d\"},{\"005\":\"20210213170746.7\"},{\"010\":{\"subfields\":[{\"a\":\" 58020553 \"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"022\":{\"subfields\":[{\"a\":\"0022-0469\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"(CStRLIN)NYCX1604275S\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"(NIC)notisABP6388\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"366832\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"035\":{\"subfields\":[{\"a\":\"(OCoLC)1604275\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"040\":{\"subfields\":[{\"d\":\"CtY\"},{\"d\":\"MBTI\"},{\"d\":\"CtY\"},{\"d\":\"MBTI\"},{\"d\":\"NIC\"},{\"d\":\"CStRLIN\"},{\"d\":\"NIC\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"050\":{\"subfields\":[{\"a\":\"BR140\"},{\"b\":\".J6\"}],\"ind1\":\"0\",\"ind2\":\" \"}},{\"082\":{\"subfields\":[{\"a\":\"270.05\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"222\":{\"subfields\":[{\"a\":\"The Journal of ecclesiastical history\"}],\"ind1\":\"0\",\"ind2\":\"4\"}},{\"245\":{\"subfields\":[{\"a\":\"The Journal of ecclesiastical history.\"}],\"ind1\":\"0\",\"ind2\":\"4\"}},{\"260\":{\"subfields\":[{\"a\":\"London,\"},{\"b\":\"Cambridge University Press [etc.]\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"265\":{\"subfields\":[{\"a\":\"32 East 57th St., New York, 10022\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"300\":{\"subfields\":[{\"a\":\"v.\"},{\"b\":\"25 cm.\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"310\":{\"subfields\":[{\"a\":\"Quarterly,\"},{\"b\":\"1970-\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"321\":{\"subfields\":[{\"a\":\"Semiannual,\"},{\"b\":\"1950-69\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"362\":{\"subfields\":[{\"a\":\"v. 1- Apr. 1950-\"}],\"ind1\":\"0\",\"ind2\":\" \"}},{\"570\":{\"subfields\":[{\"a\":\"Editor: C. W. Dugmore.\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"650\":{\"subfields\":[{\"a\":\"Church history\"},{\"x\":\"Periodicals.\"}],\"ind1\":\" \",\"ind2\":\"0\"}},{\"650\":{\"subfields\":[{\"a\":\"Church history\"},{\"2\":\"fast\"},{\"0\":\"(OCoLC)fst00860740\"}],\"ind1\":\" \",\"ind2\":\"7\"}},{\"655\":{\"subfields\":[{\"a\":\"Periodicals\"},{\"2\":\"fast\"},{\"0\":\"(OCoLC)fst01411641\"}],\"ind1\":\" \",\"ind2\":\"7\"}},{\"700\":{\"subfields\":[{\"a\":\"Dugmore, C. W.\"},{\"q\":\"(Clifford William),\"},{\"e\":\"ed.\"}],\"ind1\":\"1\",\"ind2\":\" \"}},{\"853\":{\"subfields\":[{\"8\":\"1\"},{\"a\":\"v.\"},{\"i\":\"(year)\"}],\"ind1\":\"0\",\"ind2\":\"3\"}},{\"863\":{\"subfields\":[{\"8\":\"1\"},{\"a\":\"1-49\"},{\"i\":\"1950-1998\"}],\"ind1\":\"4\",\"ind2\":\"0\"}},{\"902\":{\"subfields\":[{\"a\":\"pfnd\"},{\"b\":\"Lintz\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"905\":{\"subfields\":[{\"a\":\"19890510120000.0\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"948\":{\"subfields\":[{\"a\":\"20141106\"},{\"b\":\"m\"},{\"d\":\"batch\"},{\"e\":\"lts\"},{\"x\":\"addfast\"}],\"ind1\":\"2\",\"ind2\":\" \"}},{\"950\":{\"subfields\":[{\"l\":\"OLIN\"},{\"a\":\"BR140\"},{\"b\":\".J86\"},{\"h\":\"01/01/01 N\"}],\"ind1\":\" \",\"ind2\":\" \"}},{\"999\":{\"subfields\":[{\"s\":\"f9d76822-ae0e-4a65-9b3b-47b7e991e5d0\"},{\"i\":\"f8c62933-95ab-4a16-bbe5-5b27558c758a\"}],\"ind1\":\"f\",\"ind2\":\"f\"}}]}",
		"formattedContent":"LEADER 01293cas a2200409 4500\n001 366832\n005 20141106221425.0\n008 750907c19509999enkqr p 0 a0eng d\n010 $a 58020553 \n022 $a0022-0469\n035 $a(CStRLIN)NYCX1604275S\n035 $a(NIC)notisABP6388\n035 $a366832\n035 $a(OCoLC)1604275\n040 $dCtY$dMBTI$dCtY$dMBTI$dNIC$dCStRLIN$dNIC\n050 0 $aBR140$b.J6\n082 $a270.05\n222 04$aThe Journal of ecclesiastical history\n245 04$aThe Journal of ecclesiastical history.\n260 $aLondon,$bCambridge University Press [etc.]\n265 $a32 East 57th St., New York, 10022\n300 $av.$b25 cm.\n310 $aQuarterly,$b1970-\n321 $aSemiannual,$b1950-69\n362 0 $av. 1- Apr. 1950-\n570 $aEditor: C. W. Dugmore.\n650 0$aChurch history$xPeriodicals.\n650 7$aChurch history$2fast$0(OCoLC)fst00860740\n655 7$aPeriodicals$2fast$0(OCoLC)fst01411641\n700 1 $aDugmore, C. W.$q(Clifford William),$eed.\n853 03$81$av.$i(year)\n863 40$81$a1-49$i1950-1998\n902 $apfnd$bLintz\n905 $a19890510120000.0\n948 2 $a20141106$bm$dbatch$elts$xaddfast\n950 $lOLIN$aBR140$b.J86$h01/01/01 N\n999 ff$sf9d76822-ae0e-4a65-9b3b-47b7e991e5d0\n\n"
	},
	"deleted":false,
	"order":0,
	"externalIdsHolder":{
		"instanceId":"f8c62933-95ab-4a16-bbe5-5b27558c758a",
		"instanceHrid":"in00000000001"
	},
	"additionalInfo":{
		"suppressDiscovery":false
	},
	"state":"ACTUAL"
}
      • mapping rules

Sample mapping rules

      • mapping parameters

Sample mapping params

4. SRS consumes each created record and applies

    • entry point: DataImportKafkaHandler
    • registered event processors applied if applicable to the record
      • InstancePostProcessingEventHandler
      • ModifyRecordEventHandler
      • MarcBibliographicMatchEventHandler
    •  if all profiles applied to the record DI_COMPLETED event is sent to signal that the import process is finished (in case of error DI_ERROR event published)

5. SRM accepts completion event and finalize the process for the record

    • entry point: RecordProcessedEventHandlingServiceImpl
    • update general job execution progress with the record result
    • save journal record if necessary
    • update final job execution status if the record is the last one to be imported

Creation flow modification

In general current flow should be applicable to MARC Authority records importing. Some issues have been identified. But the list is not complete and further investigation might reveal another problems. Known limitations are related to implicit use of MARC Bib record type:

  1. SRM always publishes Data Import Events with Record mapped to MARC Bib type
  2. Event naming includes MARC Bib type explicitly. There is no other type support inside the naming neither mechanism to identify the required event by record type

SRM always publishes Data Import Events with MARC Bib type

SRM contains class which is used by service to publish record events: RecordsPublishingServiceImpl. This class prepares event payload in the following way:

RecordsPublishingServiceImpl.prepareEventPayload() method
private DataImportEventPayload prepareEventPayload(Record record, ProfileSnapshotWrapper profileSnapshotWrapper,
JsonObject mappingRules, MappingParameters mappingParameters, OkapiConnectionParams params,
String eventType) {
HashMap<String, String> dataImportEventPayloadContext = new HashMap<>();
dataImportEventPayloadContext.put(MARC_BIBLIOGRAPHIC.value(), Json.encode(record));
dataImportEventPayloadContext.put("MAPPING_RULES", mappingRules.encode());
dataImportEventPayloadContext.put("MAPPING_PARAMS", Json.encode(mappingParameters));

return new DataImportEventPayload()
.withEventType(eventType)
.withProfileSnapshot(profileSnapshotWrapper)
.withCurrentNode(profileSnapshotWrapper.getChildSnapshotWrappers().get(0))
.withJobExecutionId(record.getSnapshotId())
.withContext(dataImportEventPayloadContext)
.withOkapiUrl(params.getOkapiUrl())
.withTenant(params.getTenantId())
.withToken(params.getToken());
}

The code in line #5 encodes Record and places it into the context map with the key = "MARC_BIBLIOGRAPHIC". This is done regardless of the real record type which can be one of the following:

EntityType from data-import-processing-core module
public enum EntityType {

MARC_BIBLIOGRAPHIC("MARC_BIBLIOGRAPHIC"),
MARC_HOLDINGS("MARC_HOLDINGS"),
MARC_AUTHORITY("MARC_AUTHORITY"),
EDIFACT_INVOICE("EDIFACT_INVOICE"),
DELIMITED("DELIMITED"),
INSTANCE("INSTANCE"),
HOLDINGS("HOLDINGS"),
ITEM("ITEM"),
ORDER("ORDER"),
INVOICE("INVOICE"),
STATIC_VALUE("STATIC_VALUE");
private final String value;
}

The enumeration already defines required type for MARC Authority records, it just has to be detected from the record and placed into the payload. Detection mechanism implemented in MarcRecordAnalyzer.java from data-import-utils module. It'll allow to put a record into the context with the appropriate type value as a key.

Event naming includes MARC Bib type only, no support for other types

Data import defines a list of available events in DataImportEventTypes enum:

DataImportEventTypes
public enum DataImportEventTypes {

DI_RAW_MARC_BIB_RECORDS_CHUNK_READ("DI_RAW_MARC_BIB_RECORDS_CHUNK_READ"),
DI_MARC_BIB_FOR_UPDATE_RECEIVED("DI_MARC_BIB_FOR_UPDATE_RECEIVED"),
DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED("DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED"),
DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED("DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED"),
DI_SRS_MARC_BIB_RECORD_CREATED("DI_SRS_MARC_BIB_RECORD_CREATED"),
DI_SRS_MARC_BIB_INSTANCE_HRID_SET("DI_SRS_MARC_BIB_INSTANCE_HRID_SET"),
DI_SRS_MARC_AUTHORITY_RECORD_CREATED("DI_SRS_MARC_AUTHORITY_RECORD_CREATED"),
DI_SRS_MARC_BIB_RECORD_UPDATED("DI_SRS_MARC_BIB_RECORD_UPDATED"),
DI_SRS_MARC_BIB_RECORD_MODIFIED("DI_SRS_MARC_BIB_RECORD_MODIFIED"),
DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING("DI_SRS_MARC_BIB_RECORD_MODIFIED_READY_FOR_POST_PROCESSING"),
DI_SRS_MARC_BIB_RECORD_MATCHED("DI_SRS_MARC_BIB_RECORD_MATCHED"),
DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING("DI_SRS_MARC_BIB_RECORD_MATCHED_READY_FOR_POST_PROCESSING"),
DI_SRS_MARC_BIB_RECORD_NOT_MATCHED("DI_SRS_MARC_BIB_RECORD_NOT_MATCHED"),
DI_INVENTORY_INSTANCE_CREATED("DI_INVENTORY_INSTANCE_CREATED"),
DI_INVENTORY_INSTANCE_MATCHED("DI_INVENTORY_INSTANCE_MATCHED"),
DI_INVENTORY_INSTANCE_NOT_MATCHED("DI_INVENTORY_INSTANCE_NOT_MATCHED"),
DI_INVENTORY_INSTANCE_UPDATED_READY_FOR_POST_PROCESSING("DI_INVENTORY_INSTANCE_UPDATED_READY_FOR_POST_PROCESSING"),
DI_INVENTORY_INSTANCE_UPDATED("DI_INVENTORY_INSTANCE_UPDATED"),
DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING("DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING"),
DI_INVENTORY_ITEM_CREATED("DI_INVENTORY_ITEM_CREATED"),
DI_INVENTORY_ITEM_MATCHED("DI_INVENTORY_ITEM_MATCHED"),
DI_INVENTORY_ITEM_NOT_MATCHED("DI_INVENTORY_ITEM_NOT_MATCHED"),
DI_INVENTORY_ITEM_UPDATED("DI_INVENTORY_ITEM_UPDATED"),
DI_INVENTORY_HOLDING_CREATED("DI_INVENTORY_HOLDING_CREATED"),
DI_INVENTORY_HOLDING_MATCHED("DI_INVENTORY_HOLDING_MATCHED"),
DI_INVENTORY_HOLDING_NOT_MATCHED("DI_INVENTORY_HOLDING_NOT_MATCHED"),
DI_INVENTORY_HOLDING_UPDATED("DI_INVENTORY_HOLDING_UPDATED"),
DI_ERROR("DI_ERROR"),
DI_COMPLETED("DI_COMPLETED");

}

There are events that are issued upon general purpose actions. Exmples of such events are:

  • DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED or
  • DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED

They are different from other events in a sense that they can be applied to MARC records of any type. For instance, parsing of incoming MARC record into common format doesn't depend on the record type. The same is true for MARC record saving in SRS database.

Because of the above it makes sence to generalize some events by removing "_BIB_" part from the name, for instance:

  • DI_RAW_MARC_BIB_RECORDS_CHUNK_PARSED becomes DI_RAW_MARC_RECORDS_CHUNK_PARSED
  • DI_PARSED_MARC_BIB_RECORDS_CHUNK_SAVED becomes DI_PARSED_MARC_RECORDS_CHUNK_SAVED

Once the general flow completes, the process can be customized for particular type of records with specific events that include "_BIB_" or "_AUTH_" or other qualifiers. This separation is supposed to happen after record creation, so the events to notify about a record's been successfully created have to have type qualifier inside:

  • DI_SRS_MARC_BIB_RECORD_CREATED
  • DI_SRS_MARC_AUTH_RECORD_CREATED

This should bring more control on record processing customization and limit the number of unwanted executions of the services that are interested in one type of record but not in the other.

Creation profile

Similar to MARC Bib records there has to be a default data import profile for MARC Authority records. Unlike bibliographic records importing though there is no need to create any additional entities, like inventory records, during authority records import. So the profile could be simplified to job profile which includes single action: Create MARC Authority in SRS.

Mock ups for the default profile:

Update from Folijet

In case of simple task to create a MARC Authority record only an empty job profile needed, without any actions


  • Job profile

  • Action profile

For the reference, default Job/Action profiles for MARC Bib record import:

Job: Create Instance and MARC Bib in SRS
{
	"id": "6409dcff-71fa-433a-bc6a-e70ad38a9604",
	"name": "Default - Create instance and SRS MARC Bib",
	"description": "This job profile creates SRS MARC Bib records and corresponding Inventory Instances using the library's default MARC-to-Instance mapping. It can be edited, duplicated, or deleted.",
	"dataType": "MARC",
	"deleted": false,
	"userInfo": {
		"firstName": "System",
		"lastName": "System",
		"userName": "System"
	},
	"parentProfiles": [],
	"childProfiles": [],
	"metadata": {
		"createdDate": "2021-01-14T14:00:00.000+00:00",
		"createdByUserId": "00000000-0000-0000-0000-000000000000",
		"updatedDate": "2021-01-14T15:00:00.462+00:00",
		"updatedByUserId": "00000000-0000-0000-0000-000000000000"
	}
}


Action: Create Instance
{
	"action": "CREATE",
	"childProfiles": [],
	"deleted": false,
	"description": "This action profile is used with FOLIO's default job profile for creating Inventory Instances and SRS MARC Bibliographic records. It can be edited, duplicated, or deleted.",
	"folioRecord": "INSTANCE",
	"id": "f8e58651-f651-485d-aead-d2fa8700e2d1",
	"metadata": {
		"createdByUserId": "00000000-0000-0000-0000-000000000000",
		"createdDate": "2021-01-14T14:00:00.000+00:00",
		"updatedByUserId": "00000000-0000-0000-0000-000000000000",
		"updatedDate": "2021-01-14T15:00:00.462+00:00"
	},
	"name": "Default - Create instance",
	"parentProfiles": [],
	"userInfo": {
		"firstName": "System",
		"lastName": "System",
		"userName": "System"
	}
}

There is no relationship between these records directly present in the data above. The relationship is tracked inside records stored in mapping tables (like job_to_action_profiles table). See a DB schema of profile related objects below. It belongs to mod-data-import-converter-storage module

Open questions:
QuestionsAnswers

Do we need some mapping rules and mapping parameters? If yes, can the rules be empty to avoid code modification?

Per Folijet: There is no need neither in mapping rules nor in mapping parameters to create MARC Auth record
What type of profiles required?Per Folijet: Only an empty job profile required. Other profiles, like Action or Matching, are not needed

Generation of HRID

TBD

Uncovered Areas

Below are the business areas that mentioned in the requirements but not covered by this spike:

  1. Updating MARC Authority records
    1. existing flow and modifications required to support Authority records
    2. matching profiles
    3. mapping rules and parameters
  2. Searching MARC Authority records with SRS MARC Query API