Quantcast
Channel: Active questions tagged cte - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 207

Optimize Recursive CTE

$
0
0

I have the following query:

WITH RECURSIVE node_path AS (  SELECT t1.record_id AS root_node, t2.destination_record_id AS end_node,         ARRAY[t1.record_id, t2.destination_record_id] AS path_array  FROM "report_entity_13aa9a01-1a1b-49dc-bfa4-b89a17e09d7c"  t1  JOIN record_link_report t2 ON t1.record_id = t2.source_record_id  UNION ALL  SELECT np.root_node, t2.destination_record_id AS end_node,         ARRAY_APPEND(np.path_array, t2.destination_record_id)  FROM node_path np  JOIN record_link_report t2 ON np.path_array[array_upper(np.path_array, 1)] = t2.source_record_id  WHERE NOT t2.destination_record_id = ANY(np.path_array))SELECT  t3.value, t4.value, t5.value FROM node_path np1JOIN node_path np2ON np1.root_node = np2.root_nodeJOIN node_path np3ON np1.root_node = np3.root_nodeJOIN "report_entity_68378509-6a8c-49df-be4f-312489f765ba" t3ON t3.RECORD_ID = np1.end_nodeJOIN "report_entity_742be900-6ade-4b7f-a811-0baba4124602" T4ON t4.record_id = np2.end_nodeJOIN "report_entity_a04fabdb-32d4-48fc-a27f-71c361aa8e84" T5ON t5.record_id = np3.end_nodeORDER BY t3.value, t4.value

The tables report_entity_ are partitions and the table record_link_report stores the connections between the records.

One simple case is the node_path cte returns the following:

root_nodeend_node
12
13
14

The partitions have:

record_idvalue
2Two
3Three
4Four

The output is:

Two, Three, Four

The query works fine but is there any way to optimize this?

Sort  (cost=8874130246090504192.00..8922528065712964608.00 rows=19359127848984059904 width=23) (actual time=2270.322..2289.420 rows=100000 loops=1)  Sort Key: t3.value, t4.value  Sort Method: quicksort  Memory: 9631kB  Buffers: shared hit=11085  CTE node_path    ->  Recursive Union  (cost=3281.02..5140335.15 rows=91829438 width=64) (actual time=407.161..1045.247 rows=300000 loops=1)          Buffers: shared hit=7851          ->  Hash Join  (cost=3281.02..13816.21 rows=300008 width=64) (actual time=407.159..624.422 rows=300000 loops=1)                Hash Cond: (t2.source_record_id = t1.record_id)                Buffers: shared hit=4441                ->  Seq Scan on record_link_report t2  (cost=0.00..6410.08 rows=300008 width=32) (actual time=0.004..57.184 rows=300008 loops=1)                      Buffers: shared hit=3410                ->  Hash  (cost=2031.01..2031.01 rows=100001 width=16) (actual time=406.871..406.872 rows=100001 loops=1)                      Buckets: 131072  Batches: 1  Memory Usage: 5712kB                      Buffers: shared hit=1031                      ->  Seq Scan on "report_entity_13aa9a01-1a1b-49dc-bfa4-b89a17e09d7c" t1  (cost=0.00..2031.01 rows=100001 width=16) (actual time=362.856..381.429 rows=100001 loops=1)                            Buffers: shared hit=1031          ->  Hash Join  (cost=10160.18..328993.02 rows=9152943 width=64) (actual time=297.311..297.313 rows=0 loops=1)                Hash Cond: (np.path_array[array_upper(np.path_array, 1)] = t2_1.source_record_id)                Join Filter: (t2_1.destination_record_id <> ALL (np.path_array))                Buffers: shared hit=3410                ->  WorkTable Scan on node_path np  (cost=0.00..60001.60 rows=3000080 width=48) (actual time=0.001..50.510 rows=300000 loops=1)                ->  Hash  (cost=6410.08..6410.08 rows=300008 width=32) (actual time=141.968..141.969 rows=300008 loops=1)                      Buckets: 524288  Batches: 1  Memory Usage: 22847kB                      Buffers: shared hit=3410                      ->  Seq Scan on record_link_report t2_1  (cost=0.00..6410.08 rows=300008 width=32) (actual time=0.022..57.608 rows=300008 loops=1)                            Buffers: shared hit=3410  ->  Merge Join  (cost=51707965.01..290387550236042944.00 rows=19359127848984059904 width=23) (actual time=1910.945..2216.450 rows=100000 loops=1)        Merge Cond: (np1.root_node = np2.root_node)        Buffers: shared hit=11085        ->  Sort  (cost=17131318.17..17360891.77 rows=91829438 width=21) (actual time=1418.241..1443.674 rows=100000 loops=1)              Sort Key: np1.root_node              Sort Method: quicksort  Memory: 9323kB              Buffers: shared hit=8882              ->  Hash Join  (cost=3281.00..3102524.53 rows=91829438 width=21) (actual time=454.861..1383.738 rows=100000 loops=1)                    Hash Cond: (np1.end_node = t3.record_id)                    Buffers: shared hit=8882                    ->  CTE Scan on node_path np1  (cost=0.00..1836588.76 rows=91829438 width=32) (actual time=407.163..1214.671 rows=300000 loops=1)                          Buffers: shared hit=7851                    ->  Hash  (cost=2031.00..2031.00 rows=100000 width=21) (actual time=47.422..47.423 rows=100000 loops=1)                          Buckets: 131072  Batches: 1  Memory Usage: 6200kB                          Buffers: shared hit=1031                          ->  Seq Scan on "report_entity_68378509-6a8c-49df-be4f-312489f765ba" t3  (cost=0.00..2031.00 rows=100000 width=21) (actual time=0.009..20.136 rows=100000 loops=1)                                Buffers: shared hit=1031        ->  Materialize  (cost=34576646.84..737891762664.76 rows=42163228416979 width=50) (actual time=492.679..703.209 rows=100000 loops=1)              Buffers: shared hit=2203              ->  Merge Join  (cost=34576646.84..632483691622.31 rows=42163228416979 width=50) (actual time=492.676..653.190 rows=100000 loops=1)                    Merge Cond: (np2.root_node = np3.root_node)                    Buffers: shared hit=2203                    ->  Sort  (cost=17445328.67..17674902.27 rows=91829438 width=31) (actual time=247.337..272.788 rows=100000 loops=1)                          Sort Key: np2.root_node                          Sort Method: quicksort  Memory: 10414kB                          Buffers: shared hit=1172                          ->  Hash Join  (cost=3422.00..3102665.53 rows=91829438 width=31) (actual time=49.675..213.451 rows=100000 loops=1)                                Hash Cond: (np2.end_node = t4.record_id)                                Buffers: shared hit=1172                                ->  CTE Scan on node_path np2  (cost=0.00..1836588.76 rows=91829438 width=32) (actual time=0.001..60.441 rows=300000 loops=1)                                ->  Hash  (cost=2172.00..2172.00 rows=100000 width=31) (actual time=49.404..49.405 rows=100000 loops=1)                                      Buckets: 131072  Batches: 1  Memory Usage: 7254kB                                      Buffers: shared hit=1172                                      ->  Seq Scan on "report_entity_742be900-6ade-4b7f-a811-0baba4124602" t4  (cost=0.00..2172.00 rows=100000 width=31) (actual time=0.013..20.451 rows=100000 loops=1)                                            Buffers: shared hit=1172                    ->  Materialize  (cost=17131318.17..17590465.36 rows=91829438 width=19) (actual time=245.298..312.802 rows=100000 loops=1)                          Buffers: shared hit=1031                          ->  Sort  (cost=17131318.17..17360891.77 rows=91829438 width=19) (actual time=245.295..271.823 rows=100000 loops=1)                                Sort Key: np3.root_node                                Sort Method: quicksort  Memory: 9323kB                                Buffers: shared hit=1031                                ->  Hash Join  (cost=3281.00..3102524.53 rows=91829438 width=19) (actual time=48.970..210.833 rows=100000 loops=1)                                      Hash Cond: (np3.end_node = t5.record_id)                                      Buffers: shared hit=1031                                      ->  CTE Scan on node_path np3  (cost=0.00..1836588.76 rows=91829438 width=32) (actual time=0.001..60.433 rows=300000 loops=1)                                      ->  Hash  (cost=2031.00..2031.00 rows=100000 width=19) (actual time=48.697..48.698 rows=100000 loops=1)                                            Buckets: 131072  Batches: 1  Memory Usage: 6092kB                                            Buffers: shared hit=1031                                            ->  Seq Scan on "report_entity_a04fabdb-32d4-48fc-a27f-71c361aa8e84" t5  (cost=0.00..2031.00 rows=100000 width=19) (actual time=0.014..19.910 rows=100000 loops=1)                                                  Buffers: shared hit=1031Planning:  Buffers: shared hit=16Planning Time: 0.583 msJIT:  Functions: 68  Options: Inlining true, Optimization true, Expressions true, Deforming true  Timing: Generation 3.347 ms, Inlining 9.956 ms, Optimization 216.984 ms, Emission 136.095 ms, Total 366.382 msExecution Time: 2339.415 ms
CREATE TABLE record_link_report (    tenant_id uuid NOT NULL,    source_record_id uuid NOT NULL,    destination_record_id uuid NOT NULL,    CONSTRAINT record_link_report_pkey PRIMARY KEY (tenant_id, source_record_id, destination_record_id));
CREATE TABLE "report_entity" (    tenant_id uuid NOT NULL,    attribute_id uuid NOT NULL,    record_id uuid NOT NULL,    record_id uuid NOT NULL,    entity_id uuid NOT NULL,    value text NULL,    CONSTRAINT "report_entity_pk" PRIMARY KEY (tenant_id, attribute_id, record_id))PARTITION BY LIST (entity_id););

Postgres 15


Viewing all articles
Browse latest Browse all 207

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>