Performance of getting source records and counting the total number of records

During working on the bug MODSOURCE-85, the following measurements of queries executed in the database with 7.6 million records were made:

  1. Query to source_records_view with a limit 10 and nested query for calculation total records amount.

    EXPLAIN ANALYZE SELECT *,
    	  (SELECT COUNT(_id)
    	   FROM diku_mod_source_record_storage.records
    	   WHERE (lower(f_unaccent(diku_mod_source_record_storage.records.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')))) AS totalRows
    FROM diku_mod_source_record_storage.source_records_view  
    WHERE lower(f_unaccent(diku_mod_source_record_storage.source_records_view.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'))
    LIMIT 10
    Query plan
    "Limit  (cost=2590083.17..2617567.28 rows=10 width=56) (actual time=224095.763..224106.902 rows=10 loops=1)"
    "  InitPlan 1 (returns $0)"
    "    ->  Aggregate  (cost=2590071.75..2590071.75 rows=1 width=8) (actual time=224093.579..224093.579 rows=1 loops=1)"
    "          ->  Seq Scan on records records_1  (cost=0.00..2590067.74 rows=1602 width=16) (actual time=0.315..220471.600 rows=7641917 loops=1)"
    "                Filter: (lower(f_unaccent((jsonb ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)"
    "  ->  Merge Left Join  (cost=11.41..105016785.14 rows=38210 width=56) (actual time=224095.762..224106.892 rows=10 loops=1)"
    "        Merge Cond: ((records.jsonb ->> 'parsedRecordId'::text) = (marc_records.jsonb ->> 'id'::text))"
    "        Filter: (lower(f_unaccent((json_build_object('recordId', (records.jsonb ->> 'id'::text), 'snapshotId', (records.jsonb ->> 'snapshotId'::text), 'recordType', (records.jsonb ->> 'recordType'::text), 'deleted', (records.jsonb ->> 'deleted'::text), 'order', (records.jsonb ->> 'order'::text), 'additionalInfo', (records.jsonb -> 'additionalInfo'::text), 'metadata', (records.jsonb -> 'metadata'::text), 'rawRecord', raw_records.jsonb, 'parsedRecord', COALESCE(marc_records.jsonb)) ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)"
    "        ->  Nested Loop  (cost=1.12..77965435.14 rows=7641928 width=1808) (actual time=0.049..0.321 rows=10 loops=1)"
    "              ->  Index Scan using records_parsed_record_id_idx_btree on records  (cost=0.56..24763758.44 rows=7641928 width=550) (actual time=0.016..0.038 rows=10 loops=1)"
    "                    Index Cond: ((jsonb ->> 'parsedRecordId'::text) IS NOT NULL)"
    "              ->  Index Scan using raw_records_id_idx_btree on raw_records  (cost=0.56..6.95 rows=1 width=1258) (actual time=0.023..0.024 rows=1 loops=10)"
    "                    Index Cond: ((jsonb ->> 'id'::text) = (records.jsonb ->> 'rawRecordId'::text))"
    "        ->  Index Scan using marc_records_id_idx_btree on marc_records  (cost=0.56..24740116.58 rows=7638396 width=587) (actual time=0.009..0.046 rows=10 loops=1)"
    "Planning time: 2.454 ms"
    "Execution time: 224107.039 ms"
  2. The query for calculation of the total number of records with filter conditions.

    EXPLAIN ANALYZE SELECT COUNT(_id)
    FROM diku_mod_source_record_storage.records
    WHERE (lower(f_unaccent(diku_mod_source_record_storage.records.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))')))
    Query plan
    "Aggregate  (cost=2590071.75..2590071.75 rows=1 width=8) (actual time=210258.396..210258.396 rows=1 loops=1)"
    "  ->  Seq Scan on records  (cost=0.00..2590067.74 rows=1602 width=16) (actual time=0.097..206949.371 rows=7641917 loops=1)"
    "        Filter: (lower(f_unaccent((jsonb ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)"
    "Planning time: 0.224 ms"
    "Execution time: 210258.430 ms"
  3. The query for calculation of the total number of records without filter conditions.

    EXPLAIN ANALYZE SELECT COUNT(_id) FROM diku_mod_source_record_storage.records
    Query plan
    "Finalize Aggregate  (cost=586653.92..586653.93 rows=1 width=8) (actual time=17830.882..17830.883 rows=1 loops=1)"
    "  ->  Gather  (cost=586653.71..586653.92 rows=2 width=8) (actual time=17830.601..17839.107 rows=3 loops=1)"
    "        Workers Planned: 2"
    "        Workers Launched: 2"
    "        ->  Partial Aggregate  (cost=585653.71..585653.72 rows=1 width=8) (actual time=17817.788..17817.788 rows=1 loops=3)"
    "              ->  Parallel Seq Scan on records  (cost=0.00..577693.37 rows=3184137 width=16) (actual time=0.464..17129.066 rows=2547306 loops=3)"
    "Planning time: 0.118 ms"
    "Execution time: 17839.161 ms"
  4. Query to source_records_view with a limit 10.

    EXPLAIN ANALYZE SELECT *
    FROM diku_mod_source_record_storage.source_records_view
    WHERE lower(f_unaccent(diku_mod_source_record_storage.source_records_view.jsonb->>'deleted')) ~ lower(f_unaccent('(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'))
    LIMIT 10
    Query plan
    "Limit  (cost=11.41..27495.52 rows=10 width=48) (actual time=2.389..15.713 rows=10 loops=1)"
    "  ->  Merge Left Join  (cost=11.41..105016785.14 rows=38210 width=48) (actual time=2.388..15.703 rows=10 loops=1)"
    "        Merge Cond: ((records.jsonb ->> 'parsedRecordId'::text) = (marc_records.jsonb ->> 'id'::text))"
    "        Filter: (lower(f_unaccent((json_build_object('recordId', (records.jsonb ->> 'id'::text), 'snapshotId', (records.jsonb ->> 'snapshotId'::text), 'recordType', (records.jsonb ->> 'recordType'::text), 'deleted', (records.jsonb ->> 'deleted'::text), 'order', (records.jsonb ->> 'order'::text), 'additionalInfo', (records.jsonb -> 'additionalInfo'::text), 'metadata', (records.jsonb -> 'metadata'::text), 'rawRecord', raw_records.jsonb, 'parsedRecord', COALESCE(marc_records.jsonb)) ->> 'deleted'::text))) ~ '(^|[[:punct:]]|[[:space:]]|(?=[[:punct:]]|[[:space:]]))false($|[[:punct:]]|[[:space:]]|(?<=[[:punct:]]|[[:space:]]))'::text)"
    "        ->  Nested Loop  (cost=1.12..77965435.14 rows=7641928 width=1808) (actual time=0.099..0.932 rows=10 loops=1)"
    "              ->  Index Scan using records_parsed_record_id_idx_btree on records  (cost=0.56..24763758.44 rows=7641928 width=550) (actual time=0.012..0.107 rows=10 loops=1)"
    "                    Index Cond: ((jsonb ->> 'parsedRecordId'::text) IS NOT NULL)"
    "              ->  Index Scan using raw_records_id_idx_btree on raw_records  (cost=0.56..6.95 rows=1 width=1258) (actual time=0.076..0.077 rows=1 loops=10)"
    "                    Index Cond: ((jsonb ->> 'id'::text) = (records.jsonb ->> 'rawRecordId'::text))"
    "        ->  Index Scan using marc_records_id_idx_btree on marc_records  (cost=0.56..24740116.58 rows=7638396 width=587) (actual time=0.009..0.155 rows=10 loops=1)"
    "Planning time: 3.728 ms"
    "Execution time: 15.810 ms"

Summary results of queries execution:

Query numberExecution time (ms)
1224107
2210258
317839
415