Overview

This page is created to investigate Aurora serverless performance by comparing DB xlarge, 8xlarge and Aurora serverless instance types under load running Data Import (DI) with Check-in Check-out (CICO) running as background. 

Summary

  • The environment can handle the load with all compared DB instance types. 
  • No significant changes were observed comparing response times for CICO between two instance types db.r6g.xlarge and serverless. 
  • In Aurora serverless DI duration better for larger DI files.
  • Serverless v2 (32 - 128 ACUs) DB instance type configuration performs better from the start than (0.5 - 128 ACUs) due to increased capacity and its performance closer to 8xlarge. But to cut costs it's better to use (0.5 - 128 ACUs) for DB reader instance role. 
  • Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.
  • Time duration of DI without CICO didn't change after task count: mod-inventory, mod-inventory-storage, mod-circulation, mod-circulation-storage x 4.

Results

The table includes test results from running on different database instance types Here we observe that RDS CPU utilization for db.r6g.xlarge has maximum values and test duration grows proportional to file size.

But after database was switched to Aurora serverless RDS CPU didn't exceed 25% for any file size. Execution time or test duration has tendency to decreasing for bigger file size because of bigger ACUs.

DI CICO Total results

Create
Job profile: Default - Create instance and SRS MARC Bib

RDS

db.r6g.8xlarge

RDS

db.r6g.xlarge

Serverless

Serverless v2 (0.5 - 128 ACUs)

Serverless

Serverless v2 (32 - 128 ACUs)



UsersFile - Records

Duration (CICO)

Max CPU utilizationDuration

Max CPU utilization

Duration

Max CPU utilization

Duration

ACUsMax CPU utilizationDuration
1

DI Create
10k

37

27

00:05:15

00:03:21

9600:09:591700:10:07
16

00:07:17 ↓ 28%



25k

45

30

00:10:04

00:08:08

9600:18:192400:13:43
2200:11:44 ↓ 15%


50k
3000:15:549300:37:052500:22:57
2400:20:01 ↓ 11%
2CICO + DI Create2010k90 min3900:04:329400:08:081900:09:12




25k
4700:09:019600:19:212600:14:30


3


CICO DI Create

JP: PTF - Create 2

2010k90 min

9400:09:561400:13:2219



25k


9400:21:062400:23:4925

CICO DI Update

JP: PTF - Updates Success - 1

2010k90 min

7000:12:311200:17:4412



25k


7000:29:121200:31:3513

RDS CPU Utilization


8xlargexlargeserverless
RDS

CPU starts with spikes at the beginning of the tests and comes to normal after finish.


Test date: 2023-05-25

For xlarge database instance type CPU was maximum but it didn't affect DI any way. So it ran successfully 

Test date: 2023-05-29

For serverless CPU was stable and was not higher than 25%


Test date: 2023-05-30

Service

Data imports during CICO. The services worked stable and returned to there normal state after tests

CICO background process didn't affect DI and it worked as expected


Stable work of services


CICO resource consumption

Running tests for CICO PERF-593 - Getting issue details... STATUS I could observe that xlarge used more DB connections than any of DB instance types. The results mentioned in summary table show better response times over time for runs with 20 users. And no significant changes between different DB instance types. High latency was observed for all tests.

Testing results for CICO

Test date: 2023-06-02

LG: us-west-2a

RDS (db.r6g.xlarge)


RDS (db.r6g.8xlarge)


Serverless v2 (0.5 - 128 ACUs)



Serverless v2 (32 - 128 ACUs)





Users

Duration (CICO)

RDS max CPU utilizationDB connections

RDS max CPU utilization

DB connections

RDS max CPU utilization

ACUs

DB connectionsRDS max CPU utilizationACUsDB connections
1CICO830 min1646023642.5

7.5

3801.532380


2030 min214302.53784.76.2396232380

CICO Graphs


db.r6g.xlarge

db.r6g.8xlargeServerless v2 (0.5 - 128 ACUs)Serverless v2 (32 - 128 ACUs)
Response Times Over Time

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Throughput

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

RDS CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Service CPU utilization

8 users

20 users

8 users

20 users

8 users

20 users

8 users

20 users

Summary table for CICO



8 users
20 users

Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
db.r6g.xlargeCheck-In Controller02.8783.1142.7852.16
02.8893.1162.7842.118
Check-Out Controller9.1734.1034.5263.9483.212
13.7864.0614.4223.8623.079
db.r6g.8xlargeCheck-In Controller02.9463.2032.8492.17
02.9143.1212.8052.107
Check-Out Controller10.4194.1784.5653.9733.239
13.6834.0754.4343.8753.112
Serverless v2 (0.5 - 128 ACUs)Check-In Controller03.0883.3722.992.361
02.9713.2142.862.24
Check-Out Controller9.2554.4654.8624.2683.453
13.0994.2364.6964.0393.291
Serverless v2 (32 - 128 ACUs)Check-In Controller02.9723.2382.862.212
02.9333.1492.8252.135
Check-Out Controller10.5454.1914.6523.9983.274
13.4774.1064.5253.9153.174


Comparison table for response times during 10k and 25k Data Import

Response times getting better for bigger files during DI. Delta shows difference in %.


10k DI25k DI

RDS (db.r6g.xlarge)


Serverlessdelta, 75%delta, 95%RDS (db.r6g.xlarge)
Serverlessdelta, 75%delta, 95%
Requests75th pct95th pctAverage
75th pct95th pctAverage

75th pct95th pctAverage
75th pct95th pctAverage

Check-In Controller3.2183.713.138
3.3473.8673.118-4.01-4.233.2493.6653.076
3.1343.3982.993.547.29
Check-Out Controller4.9896.3614.834
5.0065.9864.602-0.345.905.2466.2984.666
4.7195.194.33310.0517.59

Average Active Sessions for DI with 50k file

To capture additional data from performance insights during DI with 50K file PERF-602 - Getting issue details... STATUS three DI operations for different DB instance types were carried out.

Serverless v2 (0.5 - 128 ACUs)RDS (db.r6g.8xlarge)db.r6g.xlarge

Example of growing ACUs for data import 

Aurora Capacity Units

serverless

Test date: 2023-05-31

ACUs grow in accordance with load and scale down without it gradually

Response times for all DB configurations

Error rate correlates with DI file size - it grows with bigger files. The lowest error rate was with Serverless during 25 DI. All errors are in Check-Out Controller for POST_circulation/check-out-by-barcode (Submit_barcode_checkout)_POST_422. 

RDS db.r6g.8xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
75th pct95th pctAverage
Check-In Controller2.9013.1032.792
2.8663.1282.772
2.9363.2322.827
2.9333.1382.815
2.8933.0642.764
Check-Out Controller4.2554.7673.956
4.2124.64.017
4.3334.7284.088
4.3524.7874.065
4.2594.7313.902

RDS

db.r6g.xlarge

All
Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller03.0533.4722.9422.506
02.9043.22.8372.199
03.2183.713.1382.726
03.2493.673.0762.672
02.9523.172.8562.242
Check-Out Controller43.3794.6565.8244.2844.343
9.1884.3224.94.2053.474
16.0614.9896.364.8344.914
36.6915.2466.34.6664.841
67.3694.2714.833.9353.427

Serverless

Before 10K DI
During 10K DI
During 25K DI
After 25K DI
Requests% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
% KO75th pct95th pctAverageLatency
Check-In Controller02.9923.3152.8882.33
03.3473.8673.1182.854
03.1343.3982.992.45
02.9613.1642.852.237
Check-Out Controller13.7534.3824.9234.1763.481
15.4595.0065.9864.6024.506
27.4534.7195.194.3333.786
61.164.3514.8923.9843.461


Due to high error rate a new set of CICO DI tests were carried out with new job profiles for Create and Update (PTF - Create 2, PTF - Updates Success - 1).

CICO DI Create + Update


Serverless

db.r6g.xlarge    
Response Times Over Time

Create

Update

Create

Update

RDS CPU utilization

Service CPU utilization

ACUs


CICO response times

For Aurora serverless it was observed response time growth instantly after DI start with smooth decreasing while executing (PTF - Create 2 job profile). 

For xlarge DB instance type CPU utilization during CICO stayed stable on level of 15% and after DI with 10k file rapidly go to 93% and stay on this level during all process of DI. 

Serverless v2 (0.5 - 128 ACUs)

 Before 10k
During 10k
During 25k

Requests75th pct95th pctAverageLatency_avg
75th pct95th pctAverageLatency_avg
75th pct95th pctAverageLatency_avg
CreateCheck-In Controller2.9283.1712.8551.851
3.3913.9993.2482.242
3.1563.4273.062.07
Check-Out Controller4.1985.0124.1062.788
4.825.6724.6423.311
4.534.934.4073.085
UpdateCheck-In Controller2.933.092.8071.823
2.9663.1522.8821.883
3.0483.2562.9511.948
Check-Out Controller4.1764.974.1522.841
4.234.464.1342.823
4.425.0124.3272.997

RDS (db.r6g.xlarge)















CreateCheck-In Controller2.7642.8672.7861.788
3.2043.4613.0772.08
3.3183.6063.1762.178
Check-Out Controller4.024.1554.0452.74
4.6284.9764.4663.148
4.8615.1814.6723.341
UpdateCheck-In Controller2.8163.0782.741.757
2.8252.9282.8371.848
2.8532.9522.8681.873
Check-Out Controller4.064.2523.9432.632
4.0774.2024.0972.78
4.1264.2434.1542.839

Appendix

Folio release: Orchid

Resource usage: R/W split disabled for all modules

Links to Grafana

Test date: 2023-05-25 - 2023-05-31

Baseline xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685351205171&to=1685356817553

Baseline 8xlarge

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685025858811&to=1685033029740

Aurora Serverless

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685436750832&to=1685442470092


Test date: 2023-06-02 - 2023-06-06

db.r6g.xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685692747425&to=1685694623603

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685695312772&to=1685697366883

db.r6g.8xlarge

8 users: 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685700612764&to=1685702445076

20 users 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685702803775&to=1685704908814

Serverless v2 (0.5 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686043433681&to=1686045340051

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1686045911070&to=1686048158943

Serverless v2 (32 - 128 ACUs)

8 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685710370012&to=1685712636325

20 users:

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-test-copy?orgId=1&var-percentile=95&var-test_type=baseline&var-test=circulation_checkInCheckOut_orchid&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All&from=1685713535200&to=1685715506600


Test date: 2023-06-13

Serverless v2 (0.5 - 128 ACUs) CICO DI Create + Update

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&from=1686673259569&to=1686675910907&var-percentile=95&var-test_type=baseline&var-test=oasl_fixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

db.r6g.xlarge 

http://carrier-io.int.folio.ebsco.com/grafana/d/elIt9zCnz/jmeter-performance-with-average-latency?orgId=1&from=1686746379062&to=1686758375536&var-percentile=95&var-test_type=baseline&var-test=oasl_fixed1&var-env=int&var-grouping=1s&var-low_limit=250&var-high_limit=750&var-db_name=jmeter&var-sampler_type=All&var-Request=All

Configuration

DI

Version of modules:
Source Record Manager Module (mod-source-record-manager-3.6.2)
Source Record Storage Module (mod-source-record-storage-5.6.5)
Inventory Module (mod-inventory-20.0.4)
Inventory Storage Module (mod-inventory-storage-26.0.0)
Inventory Update Module (mod-inventory-update-3.0.1)
Data Import Module (mod-data-import-2.7.1)
quickMARC (mod-quick-marc-3.0.0)

CICO

Version of modules:

Okapi (okapi-5.0.1)

users (mod-users-19.1.1)

Remote storage API module (mod-remote-storage-2.0.2)

Pubsub (mod-pubsub-2.9.1)

Patron Blocks Module (mod-patron-blocks-1.8.0)

Inventory Storage Module (mod-inventory-storage-26.0.0)

Inventory Module (mod-inventory-20.0.4)

feesfines (mod-feesfines-18.2.1)

Configuration (mod-configuration-5.9.1)

Circulation Storage Module (mod-circulation-storage-16.0.0)

Circulation Module (mod-circulation-23.5.4)

authtoken (mod-authtoken-2.13.0)

Environment

  • UI endpoint: https://aurora-serverless-test.int.aws.folio.org/
  • Okapi endpoint: https://okapi-aurora-serverless-test.int.aws.folio.org/
  • Environment is configured to use shared MSK and ES
  • Created in INT account us-west-2 region, cluster name oasl, created with snapshot of Cornell Test environment.

    Modules versions: Orchid-GA.3
    Task count: HA – okapi x3, mod-data-import, mod-data-export, mod-quick-marc, mod-data-export-spring x1, all other modules x2
    OpenSearch: fse - shared domain (6 r6g.large.search datanodes)
    MSK: dedicated cluster - total 4 brokers (kafka.m5.large)
    RDS Configuration 1: db.r6g.8xlarge instance, Aurora PostgreSQL 13.9
    RDS Configuration 2: db.r6g.xlarge instance, Aurora PostgreSQL 13.9 
    RDS Configuration 3: Aurora Serverless, min ACU: 0.5, max ACU: 128 
    RDS Configuration 4: Aurora Serverless, min ACU: 32, max ACU: 128