Search result suggestions

The completion suggester provides auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters. 

MSEARCH-13

For SPIKE MSEARCH-13 suggestion endpoint has been implemented in branch:  feature/msearch-13. This controller allows performing suggest requests to Elasticsearch using 2 query parameters - query (suggestion prefix to analyze) and limit (default value is 5).

-XGET .../search/instances/suggestions?query=book&limit=5 

Required Elasticsearch index mappings for suggestion field

{
  "suggest": {
    "type": "completion", 
    "analyzer": "simple",
    "max_input_length": "50"    # terms longer than 50 characters will be truncated to reduce memory consumption
  }
}

Other fields can be copied to this field using copy_to functionality in resource metadata description:

{
  ...
  "title": {
    "searchTypes": "sort",
    "inventorySearchTypes": [ "title", "keyword" ],
    "index": "multilang",
    "showInResponse": true,
    "mappings": {
      "copy_to": [ "sort_title", "suggest" ]
    }
  },
  ...
}

Elasticsearch suggest query:

{
  "from": 0,
  "size": 0,
  "_source": "false",
  "suggest": {
    "completion": {
      "prefix": "book",           # suggestion query prefix
      "completion": {             # type of the suggestion
        "field": "suggest",       # field, that will be used as source of suggestions (required) 
        "size": 5,                # number of suggest terms to return
        "skip_duplicates": true   # removes duplicates from result
      }
    }
  }
}

Performance results of completion query:

  • Indexed 2,5 million of instances
  • Elasticsearch requires 2500 MB of Java heap to store completion data
  • Response time ~8-10ms from Elasticsearch

MSEARCH-119

SPIKE MSEARCH-119 assumes that is there is a way to return suggest results using wildcard or prefix query.

Elasticsearch field mapping

 "suggest": {
   "type": "keyword",
   "normalizer": "keyword_lowercase",
   "store": true
}

Elasticsearch query

{
  "from": 0,
  "size": 0,
  "query": {
    "prefix": {
      "keyword_suggest": {
        "value": "wit"
      }
    }
  },
  "_source": false,
  "stored_fields": [ "keyword_suggest" ]
}

It will return response like

Search response
{
  "took": 89,
  "timed_out": false,
  "_shards": {
    "total": 4,
    "successful": 4,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1127,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "4c9664c8-565b-4245-984f-3dfa769abe8d",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "politieke machtsstrijd in en om de voornaamste belgische steden.1830-1848.",
            "politieke machtsstrijd in en om de voornaamste belgische steden.1830-1848.",
            "pro civitate. historische uitgaven. reeks in -8⁰, nr. 37",
            "collection histoire pro civitate.série in-8ono 37.",
            "witte, els",
            "belgium--politics and government--1830-1914",
            "politics and government",
            "belgium.",
            "1830-1914"
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "40cc07e3-a4b4-4386-8c4b-af2d424764db",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "organisation für innovationsentscheidungen;das promotoren-modell.",
            "organisation für innovationsentscheidungen;das promotoren-modell.",
            "schriften der kommission für wirtschaftlichen und sozialen wandel, bd. 2",
            "kommission für wirtschaftlichen und sozialen wandel.schriften,bd. 2.",
            "witte, eberhard",
            "technological innovations",
            "decision-making",
            "decision making.",
            "technological innovations."
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "90e29404-2178-4503-8202-1e56abf92a4d",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "brecht, as they knew him.hubert witt, editor. john peet, translator.",
            "erinnerungen an brecht.english.peet",
            "brecht, as they knew him.hubert witt, editor. john peet, translator.",
            "new world paperbacks",
            "witt, hubert",
            "brecht, bertolt,--1898-1956"
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "b755f723-f279-43eb-8e53-938930efcc99",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "political maxims of the state of hollandcomprehending a general view of the civil government of that republic, and the principles on which it is founded : the nature, rise, and progress of the commerce of its subjects, and of their true interests with respect to all their neighbours /by john de witt ; translated from the dutch original, which contains many curious passages not to be found in any of the french versions ; to which is prefixed, historical memoirs of the two illustrious brothers cornelius and john de witt.",
            "aanwysing der heilsame politike gronden en maximen van de republike van holland en west-vriesland.english",
            "political maxims of the state of hollandcomprehending a general view of the civil government of that republic, and the principles on which it is founded : the nature, rise, and progress of the commerce of its subjects, and of their true interests with respect to all their neighbours /by john de witt ; translated from the dutch original, which contains many curious passages not to be found in any of the french versions ; to which is prefixed, historical memoirs of the two illustrious brothers cornelius and john de witt.",
            "goldsmiths'-kress library of economic literature ;no. 8031.2.",
            "court, pieter de la, approximately 1618-1685.",
            "witt, johan de, 1625-1672.",
            "witt, cornelis de, 1623-1672.",
            "witt, johan de,--1625-1672.",
            "witt, cornelis de,--1623-1672.",
            "political science--netherlands.",
            "netherlands--commercial policy.",
            "industries--netherlands.",
            "netherlands--foreign relations--1648-1714.",
            "commercial policy.",
            "diplomatic relations.",
            "industries.",
            "political science.",
            "netherlands.",
            "1648-1714"
          ]
        }
      },
      {
        "_index": "instance_diku",
        "_type": "_doc",
        "_id": "0fa3fb53-e913-4694-bcef-9e4a820e55d6",
        "_score": 1.0,
        "_routing": "diku",
        "fields": {
          "keyword_suggest": [
            "the psychology of the salem witchcraft excitement of 1692and its practical application to our own time.",
            "psychology of the salem witchcraft excitement of 1692and its practical application to our own time.",
            "library of american civilization ;lac 14279.",
            "beard, george m, (george miller), 1839-1883.",
            "guiteau, charles j,--(charles julius),--1841-1882.",
            "witchcraft--massachusetts--salem.",
            "forensic psychology.",
            "guiteau, charles j.--(charles julius),--1841-1882.",
            "witchcraft.",
            "massachusetts--salem."
          ]
        }
      }
    ]
  }
}

Values from field → keyword_suggest using java code to retrieve relevant Suggest Term using startWith() method.

Disadvantages of this approach:

  • It results in N random documents from Elasticsearch index without relevancy (score=1 for all search hits)
  • Using copy_to functionality all values returned in a lowercase way
  • There is no way to use fuzziness search, suggestions can be provided using only by exact prefix match

Performance results:

  • Indexed 2,5 million instances
  • Response time ~20-50ms from Elasticsearch
  • Reindexing process is slightly faster and it does not require a lot of Java Heap

Performance tests results: