Improve performance of DB query with increasing Offset parameter

In this investigation the rancher environment and pgadmin were used to run queries.

Results of investigation

As per the report, the query plan to test looks like the following:

EXPLAIN ANALYZE WITH "cte" AS
	(SELECT COUNT(*)
		FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
		WHERE ((RLB."state" = 'OLD'
										OR RLB."state" = 'ACTUAL'
										OR RLB."leader_record_status" = 'n'
										OR RLB."leader_record_status" = 'c'
										OR RLB."leader_record_status" = 'p')
									AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
									AND RLB."record_type" = 'MARC_BIB'
									AND RLB."leader_record_status" IS NOT NULL))
SELECT RLB."id",
	RLB."snapshot_id",
	RLB."matched_id",
	RLB."generation",
	RLB."record_type",
	RLB."external_id",
	RLB."state",
	RLB."leader_record_status",
	RLB."order",
	RLB."suppress_discovery",
	RLB."created_by_user_id",
	RLB."created_date",
	RLB."updated_by_user_id",
	RLB."updated_date",
	RLB."external_hrid",
	"marc_records_lb"."content",
	"count"
FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
LEFT OUTER JOIN FS09000000_MOD_SOURCE_RECORD_STORAGE.MARC_RECORDS_LB ON RLB."id" = "marc_records_lb"."id"
RIGHT OUTER JOIN
	(SELECT *
		FROM "cte") AS "alias_80949780" ON 1 = 1
WHERE ((RLB."state" = 'OLD'
								OR RLB."state" = 'ACTUAL'
								OR RLB."leader_record_status" = 'n'
								OR RLB."leader_record_status" = 'c'
								OR RLB."leader_record_status" = 'p')
							AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
							AND RLB."record_type" = 'MARC_BIB'
							AND RLB."leader_record_status" IS NOT NULL)
ORDER BY RLB."id" ASC
LIMIT 101
OFFSET {offset parameter can be either 2220000, or 3526100}

According to the results, it was noticed a performance degradation of the query above when using OFFSET 2220000 and 3526100 (from 13s to 21s respectively).

To address the issue, one of the possible solutions (and most effective according to the article) is to use indexes on ID column and then compare IDs instead of OFFSET. Following this approach, every next request should use last returned ID as the first one.

First step is to create uuid_idx on ID column of records_lb table:

CREATE INDEX uuid_idx ON fs09000000_mod_source_record_storage.records_lb USING btree (id ASC)

Then, define the ID of the first record with offset 2220000 and it can be done by running the following query:

SELECT RLB.ID
FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
WHERE (RLB."state" = 'OLD'
							OR RLB."state" = 'ACTUAL'
							OR RLB."leader_record_status" = 'n'
							OR RLB."leader_record_status" = 'c'
							OR RLB."leader_record_status" = 'p')
	AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
	AND RLB."record_type" = 'MARC_BIB'
	AND RLB."leader_record_status" IS NOT NULL
ORDER BY ID ASC
LIMIT 1
OFFSET 2220000

And the result is 477c423d-3a3c-45ad-9ee7-0ec107110dcd.

Next, use the ID obtained above instead of OFFSET:

EXPLAIN ANALYZE WITH "cte" AS
	(SELECT COUNT(*)
		FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
		WHERE ((RLB."state" = 'OLD'
										OR RLB."state" = 'ACTUAL'
										OR RLB."leader_record_status" = 'n'
										OR RLB."leader_record_status" = 'c'
										OR RLB."leader_record_status" = 'p')
									AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
									AND RLB."record_type" = 'MARC_BIB'
									AND RLB."leader_record_status" IS NOT NULL))
SELECT RLB."id",
	RLB."snapshot_id",
	RLB."matched_id",
	RLB."generation",
	RLB."record_type",
	RLB."external_id",
	RLB."state",
	RLB."leader_record_status",
	RLB."order",
	RLB."suppress_discovery",
	RLB."created_by_user_id",
	RLB."created_date",
	RLB."updated_by_user_id",
	RLB."updated_date",
	RLB."external_hrid",
	"marc_records_lb"."content",
	"count"
FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
LEFT OUTER JOIN FS09000000_MOD_SOURCE_RECORD_STORAGE.MARC_RECORDS_LB ON RLB."id" = "marc_records_lb"."id"
RIGHT OUTER JOIN
	(SELECT *
		FROM "cte") AS "alias_80949780" ON 1 = 1
WHERE ((RLB."state" = 'OLD'
								OR RLB."state" = 'ACTUAL'
								OR RLB."leader_record_status" = 'n'
								OR RLB."leader_record_status" = 'c'
								OR RLB."leader_record_status" = 'p')
							AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
							AND RLB."record_type" = 'MARC_BIB'
							AND RLB."leader_record_status" IS NOT NULL)
	AND RLB.ID >= '477c423d-3a3c-45ad-9ee7-0ec107110dcd'
ORDER BY RLB."id" ASC
LIMIT 101

And the result:

"QUERY PLAN" "Limit (cost=374040.07..374144.07 rows=101 width=1293) (actual time=1020.304..1023.703 rows=101 loops=1)" " -> Nested Loop (cost=374040.07..6257256.02 rows=5713333 width=1293) (actual time=1020.303..1023.692 rows=101 loops=1)" " -> Gather Merge (cost=1001.01..5812800.28 rows=5713333 width=1285) (actual time=5.767..9.036 rows=101 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Nested Loop Left Join (cost=0.99..5152339.87 rows=2380555 width=1285) (actual time=0.050..0.391 rows=52 loops=3)" " -> Parallel Index Scan using records_lb_pkey on records_lb rlb (cost=0.56..1386885.59 rows=2380555 width=141) (actual time=0.033..0.132 rows=52 loops=3)" " Index Cond: (id >= '477c423d-3a3c-45ad-9ee7-0ec107110dcd'::uuid)" " Filter: ((leader_record_status IS NOT NULL) AND (record_type = 'MARC_BIB'::fs09000000_mod_source_record_storage.record_type) AND (updated_date <= CURRENT_TIMESTAMP) AND ((state = 'OLD'::fs09000000_mod_source_record_storage.record_state) OR (state = 'ACTUAL'::fs09000000_mod_source_record_storage.record_state) OR (leader_record_status = 'n'::bpchar) OR (leader_record_status = 'c'::bpchar) OR (leader_record_status = 'p'::bpchar)))" " -> Index Scan using uuid_idx_mrl on marc_records_lb (cost=0.43..1.58 rows=1 width=1160) (actual time=0.004..0.004 rows=1 loops=156)" " Index Cond: (id = rlb.id)" " -> Materialize (cost=373039.06..373039.08 rows=1 width=8) (actual time=10.045..10.046 rows=1 loops=101)" " -> Finalize Aggregate (cost=373039.06..373039.07 rows=1 width=8) (actual time=1014.527..1014.575 rows=1 loops=1)" " -> Gather (cost=373038.84..373039.05 rows=2 width=8) (actual time=1014.517..1014.568 rows=3 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Partial Aggregate (cost=372038.84..372038.85 rows=1 width=8) (actual time=1010.701..1010.702 rows=1 loops=3)" " -> Parallel Seq Scan on records_lb rlb_1 (cost=0.00..363715.22 rows=3329447 width=0) (actual time=0.009..836.649 rows=2648385 loops=3)" " Filter: ((leader_record_status IS NOT NULL) AND (record_type = 'MARC_BIB'::fs09000000_mod_source_record_storage.record_type) AND (updated_date <= CURRENT_TIMESTAMP) AND ((state = 'OLD'::fs09000000_mod_source_record_storage.record_state) OR (state = 'ACTUAL'::fs09000000_mod_source_record_storage.record_state) OR (leader_record_status = 'n'::bpchar) OR (leader_record_status = 'c'::bpchar) OR (leader_record_status = 'p'::bpchar)))" " Rows Removed by Filter: 15488" "Planning Time: 0.624 ms" "Execution Time: 1023.773 ms"

As you can see above, only 1s it took using indexes against 13s using offsets.

After that, run the following query to define the ID of the first record with offset 3526100:

SELECT RLB.ID
FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
WHERE (RLB."state" = 'OLD'
							OR RLB."state" = 'ACTUAL'
							OR RLB."leader_record_status" = 'n'
							OR RLB."leader_record_status" = 'c'
							OR RLB."leader_record_status" = 'p')
	AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
	AND RLB."record_type" = 'MARC_BIB'
	AND RLB."leader_record_status" IS NOT NULL
ORDER BY ID ASC
LIMIT 1
OFFSET 3526100

The result is 718e8758-6bc0-4357-814d-0ad03e61e1e3.

And finally, use the ID obtained above instead of OFFSET to run the main query and evaluate degradation of performance:

EXPLAIN ANALYZE WITH "cte" AS
	(SELECT COUNT(*)
		FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
		WHERE ((RLB."state" = 'OLD'
										OR RLB."state" = 'ACTUAL'
										OR RLB."leader_record_status" = 'n'
										OR RLB."leader_record_status" = 'c'
										OR RLB."leader_record_status" = 'p')
									AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
									AND RLB."record_type" = 'MARC_BIB'
									AND RLB."leader_record_status" IS NOT NULL))
SELECT RLB."id",
	RLB."snapshot_id",
	RLB."matched_id",
	RLB."generation",
	RLB."record_type",
	RLB."external_id",
	RLB."state",
	RLB."leader_record_status",
	RLB."order",
	RLB."suppress_discovery",
	RLB."created_by_user_id",
	RLB."created_date",
	RLB."updated_by_user_id",
	RLB."updated_date",
	RLB."external_hrid",
	"marc_records_lb"."content",
	"count"
FROM FS09000000_MOD_SOURCE_RECORD_STORAGE.RECORDS_LB RLB
LEFT OUTER JOIN FS09000000_MOD_SOURCE_RECORD_STORAGE.MARC_RECORDS_LB ON RLB."id" = "marc_records_lb"."id"
RIGHT OUTER JOIN
	(SELECT *
		FROM "cte") AS "alias_80949780" ON 1 = 1
WHERE ((RLB."state" = 'OLD'
								OR RLB."state" = 'ACTUAL'
								OR RLB."leader_record_status" = 'n'
								OR RLB."leader_record_status" = 'c'
								OR RLB."leader_record_status" = 'p')
							AND RLB."updated_date" <= CAST(CURRENT_TIMESTAMP AS TIMESTAMP WITH TIME ZONE)
							AND RLB."record_type" = 'MARC_BIB'
							AND RLB."leader_record_status" IS NOT NULL)
	AND RLB.ID >= '718e8758-6bc0-4357-814d-0ad03e61e1e3'
ORDER BY RLB."id" ASC
LIMIT 101

And the result:

"QUERY PLAN" "Limit (cost=374040.07..374163.18 rows=101 width=1293) (actual time=978.703..982.088 rows=101 loops=1)" " -> Nested Loop (cost=374040.07..5779773.71 rows=4434825 width=1293) (actual time=978.702..982.075 rows=101 loops=1)" " -> Gather Merge (cost=1001.01..5351299.31 rows=4434825 width=1285) (actual time=5.686..8.946 rows=101 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Nested Loop Left Join (cost=0.99..4838410.45 rows=1847844 width=1285) (actual time=0.057..1.322 rows=52 loops=3)" " -> Parallel Index Scan using records_lb_pkey on records_lb rlb (cost=0.56..1312675.28 rows=1847844 width=141) (actual time=0.040..0.163 rows=52 loops=3)" " Index Cond: (id >= '718e8758-6bc0-4357-814d-0ad03e61e1e3'::uuid)" " Filter: ((leader_record_status IS NOT NULL) AND (record_type = 'MARC_BIB'::fs09000000_mod_source_record_storage.record_type) AND (updated_date <= CURRENT_TIMESTAMP) AND ((state = 'OLD'::fs09000000_mod_source_record_storage.record_state) OR (state = 'ACTUAL'::fs09000000_mod_source_record_storage.record_state) OR (leader_record_status = 'n'::bpchar) OR (leader_record_status = 'c'::bpchar) OR (leader_record_status = 'p'::bpchar)))" " Rows Removed by Filter: 0" " -> Index Scan using uuid_idx_mrl on marc_records_lb (cost=0.43..1.91 rows=1 width=1160) (actual time=0.021..0.021 rows=1 loops=156)" " Index Cond: (id = rlb.id)" " -> Materialize (cost=373039.06..373039.08 rows=1 width=8) (actual time=9.634..9.634 rows=1 loops=101)" " -> Finalize Aggregate (cost=373039.06..373039.07 rows=1 width=8) (actual time=973.008..973.057 rows=1 loops=1)" " -> Gather (cost=373038.84..373039.05 rows=2 width=8) (actual time=972.998..973.050 rows=3 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Partial Aggregate (cost=372038.84..372038.85 rows=1 width=8) (actual time=968.951..968.952 rows=1 loops=3)" " -> Parallel Seq Scan on records_lb rlb_1 (cost=0.00..363715.22 rows=3329447 width=0) (actual time=0.010..801.974 rows=2648385 loops=3)" " Filter: ((leader_record_status IS NOT NULL) AND (record_type = 'MARC_BIB'::fs09000000_mod_source_record_storage.record_type) AND (updated_date <= CURRENT_TIMESTAMP) AND ((state = 'OLD'::fs09000000_mod_source_record_storage.record_state) OR (state = 'ACTUAL'::fs09000000_mod_source_record_storage.record_state) OR (leader_record_status = 'n'::bpchar) OR (leader_record_status = 'c'::bpchar) OR (leader_record_status = 'p'::bpchar)))" " Rows Removed by Filter: 15488" "Planning Time: 1.519 ms" "Execution Time: 982.182 ms"

You can see that it took the same 1s against 21s using offsets, in other words there is no degradation of DB performance.

Results of comparison

	time for offset = 2_220_000, s	time for offset = 3_526_100, s	Performance degradation, s
Old approach with offsets	13	21	8
New approach with indexes	1	1	0

Disadvantages of the approach with indexes

Using indexes on the column can be not so effective if the table often gets updated. In this case, indexes are recalculated every time due to the usage of balanced btree.