Quantcast
Channel: Active questions tagged cte - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 207

Why is PostgreSQL performing a sequential scan, except when my CTE is materialized?

$
0
0

The issue

My application is encountering performance issues on a query that, to the best of my understanding, should be performant. In this post, I have simplified the query and schema while still reproducing the original issue.

On my data, the following query takes approximately 55 seconds to complete, including a lengthy sequential scan and hash join:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)WITH temp_stock_history AS (    SELECT MIN(date) AS date, product_id    FROM testing.stock_history SH_    WHERE SH_.date >= '2024-01-18 12:00+0100'    GROUP BY SH_.product_id)SELECT COUNT(id)FROM testing.stock_history SHWHERE (date, product_id) IN (    SELECT date, product_id    FROM temp_stock_history);

The exact same query where the CTE is materialized (WITH temp_stock_history AS MATERIALIZED (...)) uses only index scans and hash aggregate, and succeeds in 2 seconds.

I am trying to understand why the query plans between the materialized and non-materialized versions of the query are so wildly different.

The schema

=> \d testing.stock_history                                           Table "testing.stock_history"   Column    |           Type           | Collation | Nullable |                      Default                      -------------+--------------------------+-----------+----------+--------------------------------------------------- product_id  | bigint                   |           | not null |  date        | timestamp with time zone |           | not null |  amount      | bigint                   |           | not null |  location_id | bigint                   |           |          |  id          | bigint                   |           | not null | nextval('testing.stock_history_id_seq'::regclass)Indexes:"stock_history_pkey" PRIMARY KEY, btree (id)"stock_history_date_product_id_idx" btree (date, product_id)"stock_history_location_id_idx" btree (location_id)"stock_history_product_id_location_id_date_key" UNIQUE CONSTRAINT, btree (product_id, location_id, date) NULLS NOT DISTINCTForeign-key constraints:"stock_history_product_id_fkey" FOREIGN KEY (product_id) REFERENCES testing.products(id)

Query plans

First I will give the plans and a slight interpretation, and after I will give an analysis of the difference and highlight the parts that are surprising to me.

For the materialized view

You can see that building the materialized view (from sh_ using an index on date and product id) is performed using an index only scan, with a cost of 13k and an actual time of 111ms. Then, a join (Nested Loop) between the CTE and another stock history index for sh has a cost of 90k and an actual time of about 1.5s.

                                                                                           QUERY PLAN                                                                                            ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate  (cost=151878.19..151878.20 rows=1 width=8) (actual time=2257.709..2257.713 rows=1 loops=1)   Output: count(sh.id)   Buffers: shared hit=831576, temp read=770 written=2385   CTE temp_stock_history     ->  HashAggregate  (cost=32040.93..35891.15 rows=103051 width=16) (actual time=379.824..534.225 rows=166060 loops=1)           Output: min(sh_.date), sh_.product_id           Group Key: sh_.product_id           Batches: 5  Memory Usage: 8241kB  Disk Usage: 6928kB           Buffers: shared hit=1160, temp read=494 written=1227           ->  Index Only Scan using stock_history_date_product_id_idx on testing.stock_history sh_  (cost=0.56..13543.65 rows=288738 width=16) (actual time=0.093..110.797 rows=315682 loops=1)                 Output: sh_.date, sh_.product_id                 Index Cond: (sh_.date >= '2024-01-18 11:00:00+00'::timestamp with time zone)                 Heap Fetches: 0                 Buffers: shared hit=1160   ->  Nested Loop  (cost=2576.84..89334.80 rows=10660895 width=8) (actual time=765.940..2224.865 rows=178022 loops=1)         Output: sh.id         Buffers: shared hit=831576, temp read=770 written=2385         ->  HashAggregate  (cost=2576.28..2679.33 rows=10305 width=16) (actual time=765.793..893.761 rows=166060 loops=1)               Output: temp_stock_history.date, temp_stock_history.product_id               Group Key: temp_stock_history.date, temp_stock_history.product_id               Batches: 5  Memory Usage: 8241kB  Disk Usage: 3432kB               Buffers: shared hit=1160, temp read=770 written=2385               ->  CTE Scan on temp_stock_history  (cost=0.00..2061.02 rows=103051 width=16) (actual time=379.830..641.721 rows=166060 loops=1)                     Output: temp_stock_history.date, temp_stock_history.product_id                     Buffers: shared hit=1160, temp read=494 written=1754         ->  Index Scan using stock_history_date_product_id_idx on testing.stock_history sh  (cost=0.56..8.40 rows=1 width=24) (actual time=0.007..0.007 rows=1 loops=166060)               Output: sh.product_id, sh.date, sh.amount, sh.location_id, sh.id               Index Cond: ((sh.date = temp_stock_history.date) AND (sh.product_id = temp_stock_history.product_id))               Buffers: shared hit=830416 Query Identifier: -3006587276448768070 Planning:   Buffers: shared hit=5 Planning Time: 1.273 ms JIT:   Functions: 25   Options: Inlining false, Optimization false, Expressions true, Deforming true   Timing: Generation 5.484 ms, Inlining 0.000 ms, Optimization 5.186 ms, Emission 57.921 ms, Total 68.592 ms Execution Time: 2269.782 ms(38 rows)

For the non-materialized view

Here, too, sh_ is read using the same index only scan with a cost of 13k, now with a slightly higher actual time. The result, however, is joined (Hash Join) against a parallel sequential scan for sh, together spanning a cost of 623k and an actual time of approximately 55 seconds.

                                                                                                      QUERY PLAN                                                                                                       ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate  (cost=624582.46..624582.47 rows=1 width=8) (actual time=55718.704..55718.867 rows=1 loops=1)   Output: count(sh.id)   Buffers: shared hit=4004 read=313651, temp read=138025 written=140224   ->  Gather  (cost=624582.24..624582.45 rows=2 width=8) (actual time=55718.678..55718.847 rows=3 loops=1)         Output: (PARTIAL count(sh.id))         Workers Planned: 2         Workers Launched: 2         Buffers: shared hit=4004 read=313651, temp read=138025 written=140224         ->  Partial Aggregate  (cost=623582.24..623582.25 rows=1 width=8) (actual time=55473.355..55473.366 rows=1 loops=3)               Output: PARTIAL count(sh.id)               Buffers: shared hit=4004 read=313651, temp read=138025 written=140224               Worker 0:  actual time=55191.692..55191.699 rows=1 loops=1                 JIT:                   Functions: 22                   Options: Inlining true, Optimization true, Expressions true, Deforming true                   Timing: Generation 2.507 ms, Inlining 486.144 ms, Optimization 525.587 ms, Emission 468.505 ms, Total 1482.744 ms                 Buffers: shared hit=1051 read=106011, temp read=46508 written=47241               Worker 1:  actual time=55510.459..55510.476 rows=1 loops=1                 JIT:                   Functions: 22                   Options: Inlining true, Optimization true, Expressions true, Deforming true                   Timing: Generation 2.527 ms, Inlining 426.221 ms, Optimization 584.064 ms, Emission 414.532 ms, Total 1427.345 ms                 Buffers: shared hit=1672 read=102879, temp read=45447 written=46180               ->  Hash Join  (cost=38467.42..623580.84 rows=562 width=8) (actual time=40909.238..55464.326 rows=59341 loops=3)                     Output: sh.id                     Inner Unique: true                     Hash Cond: ((sh.date = (min(sh_.date))) AND (sh.product_id = sh_.product_id))                     Buffers: shared hit=4004 read=313651, temp read=138025 written=140224                     Worker 0:  actual time=40889.456..55184.132 rows=50169 loops=1                       Buffers: shared hit=1051 read=106011, temp read=46508 written=47241                     Worker 1:  actual time=40867.277..55498.697 rows=74635 loops=1                       Buffers: shared hit=1672 read=102879, temp read=45447 written=46180                     ->  Parallel Seq Scan on testing.stock_history sh  (cost=0.00..491830.58 rows=17768158 width=24) (actual time=2.414..25400.057 rows=14215230 loops=3)                           Output: sh.product_id, sh.date, sh.amount, sh.location_id, sh.id                           Buffers: shared hit=1655 read=312494                           Worker 0:  actual time=3.281..25027.686 rows=14374447 loops=1                             Buffers: shared hit=477 read=105412                           Worker 1:  actual time=1.938..25286.905 rows=14034545 loops=1                             Buffers: shared hit=667 read=102711                     ->  Hash  (cost=36921.66..36921.66 rows=103051 width=16) (actual time=2972.388..2972.392 rows=166060 loops=3)                           Output: (min(sh_.date)), sh_.product_id                           Buckets: 262144 (originally 131072)  Batches: 2 (originally 1)  Memory Usage: 6145kB                           Buffers: shared hit=2325 read=1157, temp read=1482 written=4773                           Worker 0:  actual time=2990.050..2990.052 rows=166060 loops=1                             Buffers: shared hit=562 read=599, temp read=494 written=1591                           Worker 1:  actual time=2921.164..2921.169 rows=166060 loops=1                             Buffers: shared hit=993 read=168, temp read=494 written=1591                           ->  HashAggregate  (cost=32040.93..35891.15 rows=103051 width=16) (actual time=2149.388..2720.948 rows=166060 loops=3)                                 Output: min(sh_.date), sh_.product_id                                 Group Key: sh_.product_id                                 Batches: 5  Memory Usage: 8241kB  Disk Usage: 6928kB                                 Buffers: shared hit=2325 read=1157, temp read=1482 written=3681                                 Worker 0:  actual time=2111.476..2692.948 rows=166060 loops=1                                   Batches: 5  Memory Usage: 8241kB  Disk Usage: 6928kB                                   Buffers: shared hit=562 read=599, temp read=494 written=1227                                 Worker 1:  actual time=2120.938..2620.618 rows=166060 loops=1                                   Batches: 5  Memory Usage: 8241kB  Disk Usage: 6928kB                                   Buffers: shared hit=993 read=168, temp read=494 written=1227                                 ->  Index Only Scan using stock_history_date_product_id_idx on testing.stock_history sh_  (cost=0.56..13543.65 rows=288738 width=16) (actual time=3.315..383.314 rows=315682 loops=3)                                       Output: sh_.date, sh_.product_id                                       Index Cond: (sh_.date >= '2024-01-18 11:00:00+00'::timestamp with time zone)                                       Heap Fetches: 0                                       Buffers: shared hit=2325 read=1157                                       Worker 0:  actual time=0.109..366.573 rows=315682 loops=1                                         Buffers: shared hit=562 read=599                                       Worker 1:  actual time=0.145..413.987 rows=315682 loops=1                                         Buffers: shared hit=993 read=168 Query Identifier: -1633357988547932029 Planning:   Buffers: shared hit=249 read=10 Planning Time: 21.322 ms JIT:   Functions: 68   Options: Inlining true, Optimization true, Expressions true, Deforming true   Timing: Generation 8.789 ms, Inlining 1339.079 ms, Optimization 1789.996 ms, Emission 1220.012 ms, Total 4357.877 ms Execution Time: 55835.100 ms(76 rows)

For the non-materialized view, with sequential scans disabled

This query plan has sequential scans disabled, in order to compare the costs of the index scan chosen instead. We see the query planner now chooses the same strategy as in the materialized CTE. Of course, the cost of the Index Only Scan for sh_ and its parent HashAggregate are the same as they are the same operation. The Nested Loop has a cost of 734k (actual time ~2s).

=> begin;BEGIN=*> SET enable_seqscan=off;SET=*> EXPLAIN (ANALYZE, BUFFERS, VERBOSE)WITH temp_stock_history AS (    SELECT MIN(date) AS date, product_id    FROM testing.stock_history SH_    WHERE SH_.date >= '2024-01-18 12:00+0100'    GROUP BY SH_.product_id)SELECT COUNT(id)FROM testing.stock_history SHWHERE (date, product_id) IN (    SELECT date, product_id    FROM temp_stock_history);                                                                                             QUERY PLAN                                                                                             ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate  (cost=766020.37..766020.38 rows=1 width=8) (actual time=1898.192..1898.194 rows=1 loops=1)   Output: count(sh.id)   Buffers: shared hit=831576, temp read=494 written=1227   ->  Nested Loop  (cost=32041.50..766017.00 rows=1348 width=8) (actual time=511.515..1868.744 rows=178022 loops=1)         Output: sh.id         Buffers: shared hit=831576, temp read=494 written=1227         ->  HashAggregate  (cost=32040.93..35891.15 rows=103051 width=16) (actual time=511.423..698.215 rows=166060 loops=1)               Output: min(sh_.date), sh_.product_id               Group Key: sh_.product_id               Batches: 5  Memory Usage: 8241kB  Disk Usage: 6928kB               Buffers: shared hit=1160, temp read=494 written=1227               ->  Index Only Scan using stock_history_date_product_id_idx on testing.stock_history sh_  (cost=0.56..13543.65 rows=288738 width=16) (actual time=0.113..96.657 rows=315682 loops=1)                     Output: sh_.date, sh_.product_id                     Index Cond: (sh_.date >= '2024-01-18 11:00:00+00'::timestamp with time zone)                     Heap Fetches: 0                     Buffers: shared hit=1160         ->  Index Scan using stock_history_date_product_id_idx on testing.stock_history sh  (cost=0.56..7.07 rows=1 width=24) (actual time=0.006..0.006 rows=1 loops=166060)               Output: sh.product_id, sh.date, sh.amount, sh.location_id, sh.id               Index Cond: ((sh.date = (min(sh_.date))) AND (sh.product_id = sh_.product_id))               Buffers: shared hit=830416 Query Identifier: -1633357988547932029 Planning:   Buffers: shared hit=5 Planning Time: 1.189 ms JIT:   Functions: 17   Options: Inlining true, Optimization true, Expressions true, Deforming true   Timing: Generation 4.733 ms, Inlining 45.374 ms, Optimization 145.178 ms, Emission 87.737 ms, Total 283.022 ms Execution Time: 1906.803 ms(29 rows)

Query plan analysis

In all three cases, it seems that sh_ is built the same, but it is joined against sh in different ways.

  • The materialized Nested Loop has a cost of 89k, actual time ~1.5s.
  • The non-materialized Hash Join with Parallel Seq Scan has a total cost of ~623k, actual time ~55s.
  • The non-materialized Nested Loop (with seq scan disabled) has a total cost of ~734k, actual time ~1.3s.

The number of rows, and the number of buffer reads, aren't wildly different between the two "fast" strategies, yet their cost differs by around 8 times.

So there are two main questions after this:

  • Why does the "forced" non-materialized Nested Loop have a much higher cost than the materialized Nested Loop?
  • Why is the cost of the forced Nested Loop so wildly different from its actual time?

One hypothesis is that this has to do with how much time PostgreSQL expects to read from the disk. The database volume is on network-attached storage in a cloud environment (same AZ) so it could be that PostgreSQL expects that disk reads are cheaper than they are in practice. But a counter-point is that the number of read buffers is the same as far as I can tell.

=> SELECT version();                                                              version                                                              ----------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 15.2 (Ubuntu 15.2-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit(1 row)

Viewing all articles
Browse latest Browse all 207

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>