Master Script normal load test - NLA report


Overview


In the scope of PERF-524 it's needed to run tests to answer questions: 

  • Can the current System accommodate an average load? Load model described here NLA load model investigation and creation.
  • What happened at peak times when all workflows are running at once? 
  • Typical KPIs:
    • Service CPU
    • Service Memory
    • DB CPU
    • DB Memory
  • Response times
  • Duration of long workflows
  • Recommendations to improve on scaling up/out modules to accommodate peak times

Summary

  • The current system can accommodate an average load only without Data Import. With data import, we will have 'HTTP 500 Internal Server Error. If the issue persists, please report it to EBSCO Connect.' for several workflows in the source-storage/records/{id}/formatted requests PERF-582 - Getting issue details... STATUS , and the general response time will be longer up to 2 times for all other workflows.
  • According to the ticket PERF-525 - Getting issue details... STATUS , were performed a set of tests with changing parameters of DI modules. Results in the table show that for the configuration of mod-source-record-manager with 500 database connections count and mod-source-record-storage with 500 database connections count and 60-sec timeout it was not observed "HTTP 500 Internal Server Error." issue, but the response time for requests like GET /source-storage/records/id/formatted greatly increases during DI (up to 75 sec).
  • Increasing the number of database connections to DB_MAXPOOLSIZE = 200 for mod-source-record-manager and mod-source-record-storage did not give any positive effect on 'HTTP 500 Internal Server Error. If the issue persists, please report it to EBSCO Connect.' but duration of Data Import job was decreased twice.
  • Service CPU utilization did not exceed 36% even at the beginning of all processes together. Without spikes at the beginning and start of DI jobs average CPU usage was about 15%. Instance CPU Utilization did not exceed 22%.
  • Service memory utilization was stable, and no memory leaks were suspected during tests.
  • Each FYR (Fiscal close - end of FY rollover) job is consuming a lot of DB CPU (each spike here corresponds to each FYR job). Approximately DB CPU usage was up to 92%

Recommendations & Jiras (Optional)

Jiras

PERF-582 - Getting issue details... STATUS

Test Runs & Results

Test #

# configuration

Test durationcomments
1All workflows started at the same time1 hourRes 500 ERROR during Data Import (DI) process
2Data import and data export started with 15 min delay1 hourRes 500 ERROR during Data Import process
3Data import and data export started with 1 min delay1 hourRes 500 ERROR during Data Import process
4Data import started with 1 min delay and data export started with 20 min delay35 minRes 500 ERROR during Data Import process
5Same as Test# 4 Jenkins configuration with increased DB_MAXPOOLSIZE = 200 for mod_srs and mod_srm30 minRes 500 ERROR during Data Import process. Data import duration decreased 2 times.
6Test without Data import1 hourNO errors

 Test results from 1st test run (1st, 2nd and 3rd test run results are similar):

Test #

Workflow name 

Total time it takes to complete workflow

After environment improvement, no server errors were observed (23-06-2023)***Time-consuming requests for each workflow during DI for default configuration, finished with Response body: HTTP 500 Internal Server Error. If the issue persists, please report it to EBSCO Connect.


Avg With DI (sec)95th pct with DI (sec)Avg no DI (sec)95th pct no DI (sec)With DI AvgWithout DI
1CICO_Checkin1.2381.5061.0541.5911.1881.089
2CICO_Checkout2.1562.8291.6501.9482.0401.933
3IO_View invoices0.9071.3050.7630.9131.1781.015
4IO_Create invoices1.4331.8151.1741.3701.6891.795
5IO_Edit invoices1.9832.4221.5811.8972.2761.015
6IO_Delete invoices1.0701.1960.8040.9271.2481.175
7AIE_Approving Invoices1.7522.2111.4531.9403.2082.740
8VAR_View Authority records22.03730.6040.2890.38149.2680.295VAR_GET /source-storage/records/marc_id/formatted
9VTT_View MARC tag table41.27261.9350.9871.28475.0061.052VTT_GET source-storage/records/{id}/formatted *2
10VH_View holdings records27.32833.5791.5261.92234.6451.291VH_GET source-storage/records/{id}/formatted
11VB_View Bib22.85131.6340.8411.16842.1750.900VB_GET source-storage/records/{id}/formatted
12PRO_View patron records0.6721.1180.5660.8830.3590.638
13PRO_Delete patron records0.8921.3360.6381.0700.7280.763
14PRO_Update patron records1.3862.0971.0431.6251.1661.157
15PRO_Create patron records1.5471.9791.0981.2611.5711.286
16LO_View Ledger0.1220.4580.0500.0880.1250.047
17LO_Create ledger0.6840.8400.6160.7610.5420.629
18LO_Edit ledger0.0760.0940.0540.0850.6980.047
19LO_Delete a ledger0.0800.1290.0460.0800.0730.050
20DE_Export bib "Default instances export job profile"11 sec  (5000 records)-5 sec  (5000 records)-6 sec (5000 records)5 sec (5000 records)
21DE_Export holdings "Default holdings export job profile"3 min 16 sec  (5000 records)-26 sec  (5000 records)-28 sec (5000 records)27 sec (5000 records)
22DE_Export authority records "Default authority export job profile"8 sec (5000 records)-3 sec  (5000 records)-3 sec (5000 records)3 sec (5000 records)
23DI "DISC HRID match"1sec (1 record)---1 sec-
24DI "DS LA edeposit records update"17 min 5 sec---3 min 9 sec-
25DI "DISC New edeposit records"----13 sec-
26DI "DISC New NON edeposit records"4 sec (5 records)---3 sec-
27IRO_View item records24.77132.3271.2891.64937.8041.449IRO_GET source-storage/records/{id}/formatted 
28IRO_update item records17.22633.9500.9981.250-1.033IRO_GET source-storage/records/{id}/formatted 
29IRO_delete item records23.68631.4970.9271.09943.1221.042IRO_GET source-storage/records/{id}/formatted 
30MPS_Monitoring Pick Slips and Requests GET /circulation/requests0.4340.5270.3590.4800.3240.384
31MPS_Monitoring Pick Slips and Requests GET /circulation/pick-slips/0.1060.3120.1120.2560.0860.116
32MPS_Monitoring Pick Slips and Requests0.2970.2970.3030.3030.1360.184
33ULR_Users loan renewal1.8992.3951.4671.6611.9391.564
34ILR_Item-level requests0.8351.2670.6690.9730.7220.717
35VRO_View vendor records1.3152.3050.7131.1650.3790.773
36VRO_Edit vendor records7.4299.9805.1996.1908.1096.036
37VRO_Create vendor records1.4361.8291.0641.2001.5791.189
38VRO_Delete vendor records0.6350.8850.4120.5220.2660.458
39POO_Create purchase orders1.9742.3931.6251.7331.7261.840
40POO_View purchase orders1.5021.6031.2051.4350.8091.006
41POO_Edit purchase orders3.1183.9142.0762.9841.0951.829
42POO_Delete purchase orders2.0363.3871.4321.8300.7791.110
43RIH_Retrieving instances and holdings19.73730.0840.0350.07312.5150.039RIH_GET source-storage/source-records
44ETT_Edit MARC tag table118.391128.5083.4244.25787.1753.599ETT_GET /source-storage/records/instance_id/formatted
ETT_GET /records-editor/records
45FYR_Fiscal close - end of FY rollover10 min 30 s-11 min--13 min
46

VIR_Blacklight: View an inventory record JMeter script

1.0301.6020.8211.0420.9340.839
47BLS_Blacklight: Create a Request JMeter script1.3361.6051.1221.4041.1391.156
48

PRV_Blacklight: Create a View Patron record JMeter script

0.1060.1360.0730.1100.0780.073
49VIH_View instance holdings details21.24131.6981.4561.57247.8351.535VIH_GET /source-storage/records/instanceId/formatted

*Note that workflows that have response times or durations in red are the ones that are at least 2 times higher than when running without Data Import jobs.
 ***  Test run results with the configuration of mod-source-record-manager with 500 database connections count and mod-source-record-storage with 500 database connections count and 60-sec timeout.

Throughput graphs

For Test#1, Test#3 and Test#4 graphs are almost the same.

Memory Utilization

Memory utilization of the most memory-consuming modules & DI modules:

  • mod-inventory - 98%
  • mod-orders - 73%
  • mod-source-record-manager - 44%
  • mod-source-record-storage - 23,5%
  • mod-inventory-storage - 31%
  • data-import - 22%
  • mod-di-converter-storage - 31%

This graph represents memory usage of 3 first test runs and shows that no memory leak is suspected for all of the modules.




Service CPU Utilization 

CPU usage did not exceed 36 % for all modules. We can observe spikes in CPU usage of DI modules at the beginning of the Data Import jobs. Without DI spikes average CPU usage was about 15%.

Test# 1 - Test# 4 DI duration - 17 min.
Test# 5 DI duration - 8 min.


Most CPU-consuming modules: 

  • mod-inventory - 36%
  • mod-authtoken -25%
  • mod-data-import - 16%
  • mod-di-converter-storage -15,6%
  • mod-quick-marc - 15%
  • mod-finance-storage - 14%
  • nginx-okapi - 14%
  • okapi - 11%
  • mod-tags -10,6%
  • others - usage less than 10%
  • mod-source-record-manager - up to 7%
  • mod-source-record-storage - up to 4%


Instance CPU Utilization


RDS CPU Utilization 

Each FYR (Fiscal close - end of FY rollover) job is consuming a lot of DB CPU (each spike here corresponds to each FYR job).

Approximately DB CPU usage is up to 92%


RDS Database Connections
Test# 1 - Test# 4 for part of the test with DI  job- 620 connections count.
Test# 5 for part of the test with DI  job- 820 connections count. (allocated additional 200 connections for DI modules)

Appendix

Infrastructure

Records count :

  • mod_source_record_storage.marc_records_lb = 7300919
  • mod_source_record_storage.raw_records_lb = 7300919
  • mod_source_record_storage.records_lb = 7300919
  • mod_source_record_storage.marc_indexers = 245032159 (all records)
  • mod_source_record_storage.marc_indexers with field_no 010 = 1008129
  • mod_source_record_storage.marc_indexers with field_no 035 = 8968420
  • mod_inventory_storage.authority = 852215
  • mod_inventory_storage.holdings_record = 6091403
  • mod_inventory_storage.instance = 5581816
  • mod_inventory_storage.item = 5705915

PTF -environment ncp3 

  • m6i.2xlarge EC2 instances located in US East (N. Virginia)us-east-1
  • 2 database  instances, one reader, and one writer

    NameAPI NameMemory GIBvCPUsmax_connections
    R6G Extra Largedb.r6g.xlarge32 GiB4 vCPUs2731
  • MSK ptf-kakfa-3
    • 4 m5.2xlarge brokers in 2 zones
    • Apache Kafka version 2.8.0

    • EBS storage volume per broker 300 GiB

    • auto.create.topics.enable=true
    • log.retention.minutes=480
    • default.replication.factor=3
  • Kafka topics partitioning: - 2 partitions for DI topics


Modules memory and CPU parameters

Modules

Version

Task Definition

Running Tasks 

CPU

Memory

MemoryReservation

MaxMetaspaceSize

Xmx

mod-inventory-storage26.0.0121024220819523841440
mod-inventory20.0.4121024288025925121814
mod-tags2.0.1121281024896128768
mod-gobi2.6.0121281024896128700
mod-remote-storage2.0.2121024492044725123960
mod-invoice-storage5.6.0121281024896128700
edge-sip23.0.0121281024896128768
mod-users-bl7.5.01251214401152128922
edge-rtac2.6.0121281024896128768
mod-feesfines18.2.1121281024896128768
mod-rtac3.5.0121281024896128768
mod-erm-usage-harvester4.3.0121281024896128768
mod-search2.0.1124002592248010241440
mod-service-interaction2.2.212256204818445121290
edge-ncip1.8.1121281024896128768
mod-authtoken2.13.01251214401152128922
mod-permissions6.3.122512168415445121024
mod-circulation-storage16.0.012102415361440512896
mod-ncip1.13.1121281024896128768
mod-pubsub2.9.112102415361440512922
edge-orders2.8.112102415361440512922
mod-circulation23.5.412153628802592128700
edge-caiasoft2.0.0121281024896--
mod-data-export4.7.11110241024896128768
mod-organizations-storage4.5.1121281024896128700
mod-source-record-storage5.6.5122048560050005123600
mod-copycat1.4.0128961024896128768
mod-bulk-operations1.0.5121024307226005121536
mod-quick-marc3.0.011128228821765121664
mod-audit2.7.01210241024896128768
mod-oai-pmh3.11.3121024224820005121440
edge-connexion1.0.6121281024896128768
mod-kb-ebsco-java3.13.0121281024896128768
mod-patron5.5.2121281024896128768
mod-email1.15.3121281024896128768
mod-password-validator3.0.01212814401298512768
mod-login7.9.012102414401298512768
mod-data-export-worker3.0.12121024307226005122048
mod-agreements5.5.212128309625805122048
edge-oai-pmh2.6.1121024151213605121440
mod-eusage-reports1.3.0121281024896128768
mod-orders-storage13.5.0125121024896128700
mod-notify3.0.0121281024896128768
mod-source-record-manager3.6.2122048560050005123600
mod-di-converter-storage2.0.2221281024896128768
mod-template-engine1.18.0121281024896128768
mod-user-import3.7.2121281024896128768
mod-finance-storage8.4.1121281024896128700
mod-users19.1.1121281024896128768
mod-sender1.10.0121281024896128768
mod-graphql1.11.0121281024896128768
mod-licenses4.3.112128248023125121792
mod-invoice-b5.6.21251214401152128922
mod-event-config2.5.0121281024896128768
mod-calendar2.4.2121281024896128768
mod-erm-usage4.5.2121281024896128768
mod-patron-blocks1.8.01210241024896128768
mod-data-import2.7.111256204818445121292
mod-ebsconet2.0.01212812481024256700
edge-dematic2.0.0121281024896--
mod-task-list5.0.1111281024896128768
mod-courses1.4.7121281024896128768
mod-inventory-update3.0.1121281024896128768
mod-login-saml2.6.1121281024896128768
mod-orders12.6.612102420481440 (Recommended to change to 1544)5121024
mod-configuration5.9.1121281024896128768
mod-organizations1.7.0121281024896128700
mod-notes5.0.1121281024896128322
mod-finance4.7.1121281024896128700
mod-data-export-spring2.0.111256204818442561292
edge-patron4.11.0122561024896128768
okapi5.0.123102416841440512922
nginx-okapi2022.03.02121281024896--
pub-okapi2022.03.02121281024896-768

Methodology/Approach

To test Baseline for normal NLA library usage the JMeter scripts were used.

Tested with different DI delays:

  • From test start
  • 1 min delay
  • 20 min delay
  • without DI

Data was gathered from 2 periods with and without data import.


  • DI - data import
  • FYR - Fiscal close - end of FY rollover