Folijet - Morning Glory Snapshot Performance testing

The following resources are used:

  • 6 m4large EC2 spot instances for Kubernetes cluster;
  • 1 db.r5.xlarge instance for RDS service (writer)
  • one m5.large per 2 zones for kafks on MSK

Previous Lotus testing performance results:

Lotus Snapshot Performance testing


Modules:

Data Import Module (mod-data-import-2.5.0-SNAPSHOT.231)

Source Record Manager Module (mod-source-record-manager-3.4.0-SNAPSHOT.621)

Source Record Storage Module (mod-source-record-storage-5.4.0-SNAPSHOT.426)

Inventory Module (mod-inventory-18.2.0-SNAPSHOT.537) -  mod-inventory-18.0.0

Inventory Storage Module (mod-inventory-storage-23.1.0-SNAPSHOT.692)

Data Import Converter Storage (mod-data-import-converter-storage-1.14.0-SNAPSHOT.202)

Invoice business logic module (mod-invoice-5.4.0-SNAPSHOT.306)

Data Export Module (mod-data-export-4.5.0-SNAPSHOT.319)


Performance-optimized configuration:

Folio


MAX_REQUEST_SIZE = 4000000 (for all modules)

Kafka


2 Tasks for all DI Modules (except mod-data-import)

2 Partition for all DI Kafka topics

Please Notice: an environment should be configured in such a way that for every Kafka topic there are as many partitions as many instances created for a module connected to that topic

Examples:

Delete old topic
./kafka-topics.sh --bootstrap-server=<kafka-ip>:9092 --delete --topic perf-eks-folijet.Default.fs09000000.DI_ERROR


recreate topic with "--partitions 2 --replication-factor 1"
./kafka-topics.sh --bootstrap-server=<kafka-ip>:9092 --create --topic perf-eks-folijet.Default.fs09000000.DI_ERROR --partitions 2 --replication-factor 1

Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic perf-eks-folijet.Default.fs09000000.DI_ERROR.

get topic info
./kafka-topics.sh --bootstrap-server=<kafka-ip>:9092 --describe --topic perf-eks-folijet.Default.fs09000000.DI_ERROR
Created Topic

Topic: perf-eks-folijet.Default.fs09000000.DI_ERROR PartitionCount: 2   ReplicationFactor: 1    Configs: min.insync.replicas=1,message.format.version=2.6-IV0,unclean.leader.election.enable=true
    Topic: perf-eks-folijet.Default.fs09000000.DI_ERROR Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: perf-eks-folijet.Default.fs09000000.DI_ERROR Partition: 1    Leader: 2   Replicas: 2 Isr: 2 (edited) 

JVM


mod-data-import: -XX:MaxRAMPercentage=85.0 -XX:+UseG1GC / cpu: 128m/192m | memory: 1Gi/1Gi

mod-source-record-manager: -XX:MaxRAMPercentage=65 -XX:MetaspaceSize=120M -XX:+UseG1GC / DB_MAXPOOLSIZE = 15 / DB_RECONNECTATTEMPTS = 3 / DB_RECONNECTINTERVAL = 1000 /  cpu: 512m/1024m | memory: 1844Mi / 2Gi

mod-source-record-storage: -XX:MaxRAMPercentage=65 -XX:MetaspaceSize=120M -XX:+UseG1GC / DB_MAXPOOLSIZE = 15 / cpu: 512m/1024m | memory: 1296Mi/1440Mi

mod-inventory: -XX:MaxRAMPercentage=80 -XX:MetaspaceSize=120M -XX:+UseG1GC -Dorg.folio.metadata.inventory.storage.type=okapi  / DB_MAXPOOLSIZE = 15 / cpu: 512m/1024m | memory: 2592Mi/2880Mi

mod-inventory-storage: -XX:MaxRAMPercentage=80 -XX:MetaspaceSize=120M -XX:+UseG1GC / DB_MAXPOOLSIZE = 15 / cpu: 512m/1024m | memory: 1024Mi/1200Mi

Tests:

envprofile

records

number

time in Morning Glorytime in Lotus

Kafka

partition

number

module

instance

number

CPUdescription

MG Perf Rancher

PTF Create - 2

5000

7 min

8 min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-10T07:44:27.576+00:00

2022-06-10T07:51:11.140+00:00

MG Perf Rancher

PTF Create - 2

5000

7 min

8 min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

-Ddi.flow.control.enable=false

2022-06-14T10:24:44.093+00:00

2022-06-14T10:31:54.725+00:00

MG Perf Rancher

PTF Update - 1

5000

11 min

13 min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-20T06:18:46.748+00:00

2022-06-20T06:30:02.991+00:00

MG Perf Rancher

PTF Create - 2

10`000

16 min

19 min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-10T07:54:23.720+00:00

2022-06-10T08:08:48.484+00:00

MG Perf Rancher

PTF Create - 2

10`000

16 min

19 min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+
-Ddi.flow.control.enable=false

2022-06-14T10:36:41.482+00:00

2022-06-14T10:53:03.556+00:00

MG Perf Rancher

PTF Update - 1

10`000

22 min

25 min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-20T07:07:00.594+00:00

2022-06-20T07:28:54.905+00:00

MG Perf Rancher

PTF Create - 2

50`000

59 min

1h 25min

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-10T08:12:29.178+00:00

2022-06-10T09:11:34.642+00:00

MG Perf RancherPTF Update - 150`0001h 42 min2h 17min22512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-20T09:11:41.701+00:00

2022-06-20T10:54:29.378+00:00

MG Perf Rancher

PTF Create - 2

100`000

 2h 20min

2h 24min

(22 errors)

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-13T09:30:35.574+00:00

2022-06-13T12:26:52.484+00:00

MG Perf RancherPTF Update - 1100`0002h 49min

4h 40min

(tests were made for 1 instance number and partition number

22512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

2022-06-21T11:46:43.175+00:00

2022-06-21T14:36:05.532+00:00

57 errors Inventory/Inventory-storage errors:

io.netty.channel.StacklessClosedChannelException,

io.vertx.core.impl.NoStackTraceThrowable: Connection is not active now, current status: CLOSED

io.vertx.core.impl.NoStackTraceThrowable: Timeout

MG Perf Rancher

PTF Create - 2

500`000

14h 46min

(60 errors)

15h 37min

(31 errors)

2

2

512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.659+

60 errors

2022-06-13T14:27:40.568+00:00

2022-06-14T05:14:27.458+00:00

MG BugfestDefault Marc Bib Create50004min
22512/1024

mod-source-record-manager - Xmx = 2G

"startedDate"   : "2022-08-15T14:57:13.753+00:00",

"completedDate" : "2022-08-15T15:01:29.365+00:00",

MG BugfestDefault Marc Bib Create1000010min
22512/1024

mod-source-record-manager - Xmx = 2G

"startedDate"   : "2022-08-15T15:03:07.364+00:00",

"completedDate" : "2022-08-15T15:13:28.827+00:00"

MG BugfestCreate SRS MARC Authority50005min
22512/1024

mod-source-record-manager - Xmx = 2G

"startedDate"    : "2022-08-16T00:15:55.396+00:00",

"completedDate" : "2022-08-16T00:20:19.240+00:00",

MG BugfestCreate SRS MARC Authority100008 min
22512/1024

mod-source-record-manager - Xmx = 2G

"startedDate"       : "2022-08-16T14:52:53.191+00:00",

"completedDate" : "2022-08-16T15:00:12.723+00:00"

MG BugfestCreate SRS MARC Authority5000034min
22512/1024

mod-source-record-manager - Xmx = 2G

"startedDate"      : "2022-08-16T15:01:15.360+00:00",

"completedDate" : "2022-08-16T15:35:37.028+00:00",

Results before flow control fix: MODSOURMAN-811

envprofile

records

number

timetime in Lotus

Kafka

partition

number

module

instance

number

CPUdescription
MG Perf RancherPTF Create - 2
50007 min8 min22512/1024

mod-source-record-manager-3.4.0-SNAPSHOT.621

2022-05-27T12:58:30.331+00:00

2022-05-27T13:05:08.683+00:00

MG Perf Rancher

PTF Update - 1

5000

10 min

13 min

2

2

512/1024

2022-05-27T13:22:35.123+00:00

2022-05-27T13:32:35.344+00:00

MG Perf Rancher

PTF Create - 2

10`000

21 min | 27min

19 min

2

2

512/1024

-Ddi.flow.control.enable=false

2022-05-30T09:51:13.876+00:00 | 2022-05-31T18:13:05.977+00:00

2022-05-30T10:12:33.982+00:00 | 2022-05-31T18:40:58.928+00:00

MG Perf Rancher

PTF Update - 1

10`000

30 min

25 min

2

2

512/1024

-Ddi.flow.control.enable=false

2022-05-31T19:19:46.296+00:00

2022-05-31T19:49:59.651+00:00

MG Perf Rancher

PTF Create - 2

10`000

21 min

19 min

2

2

512/1024

-Ddi.flow.control.enable=true

2022-05-31T20:02:06.368+00:00

2022-05-31T20:23:19.490+00:00

MG Perf Rancher

PTF Update - 1

10`000

31 min

25 min

2

2

512/1024

-Ddi.flow.control.enable=true

2022-06-01T19:08:11.563+00:00

2022-06-01T19:39:58.803+00:00

MG Perf Rancher

PTF Create - 2

10`000

17 min

19 min

2

2

512/1024

-Ddi.flow.control.enable=true
-Ddi.flow.control.max.simultaneous.records=100
-Ddi.flow.control.records.threshold=50

2022-06-03T09:20:07.654+00:00

2022-06-03T09:37:51.631+00:00

MG Perf Rancher

PTF Create - 2

30`000

1h 6 min

45 min

2

2

512/1024

2022-05-27T13:37:12.980+00:00

2022-05-27T14:31:52.595+00:00

MG Perf Rancher

PTF Update - 1

30`000

 1h 26min

-

2

2

512/1024

2022-05-27T15:37:33.580+00:00

2022-05-27T17:03:15.702+00:00

MG Perf Rancher

PTF Create - 2

50`000

2h 37 min

1h 25min

2

2

512/1024

3 errors: io.netty.channel.StacklessClosedChannelException

2022-06-01T19:48:33.977+00:00

2022-06-01T22:25:59.700+00:00


60 errors (500K - PTF Create - 2):

Almost all errors with mod-inventory storage related to not having enough memory for instances (memory: 778Mi/846Mi).  Instances of mod-inventory-storage were restarted 2 times.


io.vertx.core.impl.NoStackTraceThrowable: {"errors":[{"message":"must not be null","type":"1","code":"javax.validation.constraints.NotNull.message","parameters":[{"key":"contributors[0].name","value":"null"}]}]}
io.vertx.core.impl.NoStackTraceThrowable: {"errors":[{"message":"must not be null","type":"1","code":"javax.validation.constraints.NotNull.message","parameters":[{"key":"contributors[2].name","value":"null"}]}]}

io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: Connection was closed: POST /holdings-storage/holdings
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: Connection was closed: POST /instance-storage/instances
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: finishConnect(..) failed: Connection refused: mod-inventory-storage.folijet.svc.cluster.local/172.20.250.48:80: POST /holdings-storage/holdings
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: finishConnect(..) failed: Connection refused: mod-inventory-storage.folijet.svc.cluster.local/172.20.250.48:80: POST /instance-storage/instances
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: readAddress(..) failed: Connection reset by peer: POST /holdings-storage/holdings
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: readAddress(..) failed: Connection reset by peer: POST /instance-storage/instances
io.vertx.core.impl.NoStackTraceThrowable: proxyClient failure: mod-inventory-storage-23.1.0-SNAPSHOT.692 http://mod-inventory-storage: readAddress(..) failed: Connection reset by peer: POST /item-storage/items

Performance testing of update item scenario after removal of index (MODDATAIMP-697):

In the scope of FOLIO-3388 "item_status_name_idx_gin" index for the Item status field was removed from mod-inventory-storage. However, this index potentially can be used for matching by Item status during Item update.

Job profile structure to update item and matching item by status field:

  • Job profile
  • Match profile (902$a to Item HRID)
    • For matches: sub match profile (Static match of "Available" to Item Loan and Availability Status)
      • For matches: action profile (action = update; Folio record type = Item)
        • Mapping profile (Folio record type = Item)

Test results:


records numbertimedescription
Testing with index50006 | 40 | 6 | 6

Testing without index50006 | 11 | 6 | 6


Analysis of the query for matching item by status field:

Example of the CQL query built while processing the sub-match profile for matching by status: 

status.name == "Available" AND id == "4ae2603d-1f71-457f-b69a-3eed820d6cfb"

This CQL query is translated by mod-inventory-storage to the following SQL:

SELECT id, jsonb, creation_date, created_by, holdingsrecordid, permanentloantypeid, temporaryloantypeid, materialtypeid, permanentlocationid, temporarylocationid, effectivelocationid
FROM fs09000000_mod_inventory_storage.item
WHERE (
	CASE WHEN length(lower(f_unaccent('Available'))) <= 600 
		 THEN left(lower(f_unaccent(item.jsonb->'status'->>'name')),600) LIKE lower(f_unaccent('Available')) 
		 ELSE left(lower(f_unaccent(item.jsonb->'status'->>'name')),600) LIKE left(lower(f_unaccent('Available')),600) AND lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Available')) 
	END	
) AND lower(f_unaccent(item.jsonb->'status'->>'name')) LIKE lower(f_unaccent('Available')) END) AND (id='4ae2603d-1f71-457f-b69a-3eed820d6cfb')
LIMIT 2 OFFSET 0


For the particular case when matching by item status is used as sub match profile no indexes of the Item status field are used. Instead, a more efficient algorithm is applied to perform data lookup using the index for the id field.

query plan
"Limit  (cost=0.56..8.84 rows=1 width=1156) (actual time=0.038..0.040 rows=1 loops=1)"
"  ->  Index Scan using item_pkey on item  (cost=0.56..8.84 rows=1 width=1156) (actual time=0.038..0.039 rows=1 loops=1)"
"        Index Cond: (id = '4ae2603d-1f71-457f-b69a-3eed820d6cfb'::uuid)"
"        Filter: ("left"(lower(f_unaccent(((jsonb -> 'status'::text) ->> 'name'::text))), 600) ~~ 'available'::text)"
"Planning Time: 0.195 ms"
"Execution Time: 0.053 ms"

During the testing item update scenario it was observed that the "item_status_name_idx_gin" index deletion does not impact the performance of matching Item by status. According to the results of analysis, this index is not used for matching Item by status field during data import.