Label | # Samples | Average | Min | Max | Std. Dev. | Error % | Throughput | Received KB/sec | Sent KB/sec | Avg. Bytes | Cache Usage, MB |
---|---|---|---|---|---|---|---|---|---|---|---|
Current solution | 667 | 2239 | 244 | 5703 | 1156.73 | 0.00% | 2.21114 | 15.41 | 1.14 | 7136.2 | 3.5 |
Optimized query with terms filter | 1791 | 832 | 234 | 6577 | 511.44 | 0.00% | 5.95179 | 41.47 | 3.07 | 7134.5 | 6.5 |
Terms queries | 1707 | 873 | 245 | 3780 | 527.85 | 0.00% | 5.66831 | 39.5 | 2.93 | 7135.1 | 26.7 |
Approach
Current query to retrieve the subject counts provides results in 4-6 second. It can be optimized using following approaches:
- move filters from the aggregation to the query level
- Use single msearch request to retrieve subject count without aggregation using basic terms query per each subject
Performance test configuration
Property | Value |
---|---|
Environment | |
mod-search | 3 nodes with CPU limit = 1024m and memoryLimit = 1200MB |
okapi | 3 nodes with CPU limit = 256m and memoryLimit = 536MB |
mod-authtoken | 2 nodes with CPU limit = 128m and memoryLimit = 360MB |
mod-login | 2 nodes with CPU limit = 128m and memoryLimit = 536MB |
mod-permissions | 2 nodes with CPU limit = 128m and memoryLimit = 536MB |
ElasticSearch | AWS based Number of nodes: 4 Resources: r6g.large, 16 GiB of Memory, 2 vCPUs, EBS only, 64-bit Arm platform Properties: timeout=30 |
Search queries | Terms |
Count of resources | 8,180,456 (bugwest kiwi dataset) |
Count of subjects | 4,078,882 |
Performance Test duration | 300 (5 min) |
V_USERS | 5 |
RAMP_UP | 5 sec |
HOSTNAME | falcon-perf-okapi.ci.folio.org |
Aggregated Results
Current solution
5_curr_5min_subjectBrowse_20220420.csv
Elasticsearch query:
{ "from": 0, "size": 0, "query": { "match_all": { "boost": 1.0 } }, "aggregations": { "subjects": { "terms": { "field": "plain_subjects", "size": 2, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ], "include": [ "s1", "s2" ] } } } }
Optimized query with terms filter
5_opt1_5min_subjectBrowse_20220420.csv
Elasticsearch query example
{ "from": 0, "size": 0, "query": { "bool": { "filter": [ { "terms": { "plain_subjects": [ "s1", "s2" ], "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "aggregations": { "subjects": { "terms": { "field": "plain_subjects", "size": 2, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ], "include": [ "s1", "s2" ] } } } }
Term query based counts (msearch)
5_opt2_5min_subjectBrowse_20220420.csv
Elasticsearch multisearch request example
{"index": "folio_instance_folio"} {"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s1","boost":1.0}}},"track_total_hits":true} {"index": "folio_instance_folio"} {"from":0,"size":0,"query":{"term":{"plain_subjects":{"value":"s2","boost":1.0}}},"track_total_hits":true}