PERF-234 Investigate impact of all language analyzers enabled on search

Purpose

Establish a baseline for the most common searches that the users might conduct. The performance baseline includes search by: keyword, title, subject, contributor, queries with boolean operations and filters.

All search operations performed for 1, 10, 20, 50, 100 simultaneous users.

Approach

Index bugfest-kiwi dataset at https://falcon-perf-okapi.ci.folio.org and execute list of queries to estimate current performance of mod-search application with all available languages:

{
  "languageConfigs": [
    { "code": "eng" },
    { "code": "heb" },
    { "code": "ara" },
    { "code": "chi" },
    { "code": "spa" },
    { "code": "ger" },
    { "code": "fre" },
    { "code": "ita" },
    { "code": "jpn" },
    { "code": "kor" },
    { "code": "rus" },
    { "code": "swe" }
  ],
  "totalRecords": 12
}

Performance test configuration

Property

Value

Environment

https://falcon-perf-okapi.ci.folio.org

mod-search2 nodes with CPU limit = 256m and memoryLimit = 536MB
okapi3 nodes with CPU limit = 256m and memoryLimit = 536MB
mod-authtoken2 nodes with CPU limit = 128m and memoryLimit = 360MB
mod-login2 nodes with CPU limit = 128m and memoryLimit = 536MB
mod-permissions2 nodes with CPU limit = 128m and memoryLimit = 536MB
ElasticSearch

AWS based

Number of nodes: 4
Resources: r6g.large, 16 GiB of Memory, 2 vCPUs, EBS only, 64-bit Arm platform
Properties: timeout=30
Search queries
Keyword search
keyword = "covid*"
keyword = "depress*"
keyword all "covid-19"
keyword any "covid-19"
keyword <> "climate change"
keyword all "depression"
keyword == "covid-19"
keyword any "climate change"
keyword <> "covid-19"
keyword all "climate change"
keyword = "climate ch*"
keyword == "climate change"
keyword any "depression"
keyword == "depression"
keyword <> "depression"
keyword all "*climate change*"
keyword = "*covid-19*"
keyword = "*depression*"

Title search
title all "Water in Africa"
title = "Water*"
title = "*Africa"
title any "Water in Africa"
title == "Water in Africa"
title <> "Water in Africa"
Subjects search
subjects = "Washington*"
subjects any "Washington Ibiza"
subjects all "Washington"
subjects <> "Washington"
subjects == "Washington"
subjects = "*Washington"
Contributors search
contributors == "Jon Brown"
contributors any "Brown John"
contributors == "Brown John"
contributors all "Jon Brown"
contributors <> "Jon Brown"
contributors <> "Brown John"
contributors = "*Brown"
contributors any "Jon Brown"
contributors = "Jon*"
contributors all "Brown John"
Boolean search
subjects all "washington" AND title all "international travel"
subjects all "washington" OR title all "international travel"
subjects = ("History" OR "Music")
subjects all "washington" NOT title all "international travel"
Filter search
languages == "eng"
languages == ("eng" OR "fre" OR "ita" OR "ukr")
languages == "eng" AND items.status.name == "Available"
languages == ("eng" OR "fre") AND items.status.name == ("Available" OR "In transit")
keyword all "water in africa" AND languages == "eng"
Count of resources~8,2 millions (bugfest-iris dataset)
Index size26.6 Gb per replica (53.2 total)
Performance Test duration180 (3 min)
V_USERS1/10/20/50/100
RAMP_UP5 sec
HOSTNAMEfalcon-perf-okapi.ci.folio.org

Errors

All error in test results means that mod-search received response from ElasticSearch: 

"Exception happened: Error raised when there was an exception while talking to ES. ConnectionError: Read timed out. (read timeout=30))"

After it, mod-search proxying error to the client in response. It means that the search performed longer than the timeout set up for ES.

1 simultaneous user

Boolean search - 1 user

Contributors search - 1 user

Filter search - 1 user

Keyword search - 1 user

Subjects search - 1 user

Title search - 1 user

Performance test results for 1 user

1 user.zip

10 simultaneous users

Boolean search - 10 users

Contributors search - 10 users

Filter search - 10 users

Keyword search - 10 users


Subjects search - 10 users

Title search - 10 users

Performance test results for 10 users

10 users.zip

20 simultaneous users

Boolean search - 20 users

Contributors search - 20 users

Filter search - 20 users

Keyword search - 20 users

Subjects search- 20 users

Title search- 20 users

Performance test results for 20 users

20 users.zip

50 simultaneous users

Boolean search - 50 users

Contributors search - 50 users

Filter search - 50 users

Keyword search - 50 users

Subjects search - 50 users

Title search - 50 users 

Performance test results for 50 users

50 users.zip

100 simultaneous users

Boolean search - 100 users

Contributors search - 100 users

Filter search - 100 users

Keyword search - 100 users

Subjects search - 100 users

Title search - 100 users

Performance test results for 100 users

100 users.zip