I have the following query:
WITH RECURSIVE node_path AS ( SELECT t1.record_id AS root_node, t2.destination_record_id AS end_node, ARRAY[t1.record_id, t2.destination_record_id] AS path_array FROM "report_entity_13aa9a01-1a1b-49dc-bfa4-b89a17e09d7c" t1 JOIN record_link_report t2 ON t1.record_id = t2.source_record_id UNION ALL SELECT np.root_node, t2.destination_record_id AS end_node, ARRAY_APPEND(np.path_array, t2.destination_record_id) FROM node_path np JOIN record_link_report t2 ON np.path_array[array_upper(np.path_array, 1)] = t2.source_record_id WHERE NOT t2.destination_record_id = ANY(np.path_array))SELECT t3.value, t4.value, t5.value FROM node_path np1JOIN node_path np2ON np1.root_node = np2.root_nodeJOIN node_path np3ON np1.root_node = np3.root_nodeJOIN "report_entity_68378509-6a8c-49df-be4f-312489f765ba" t3ON t3.RECORD_ID = np1.end_nodeJOIN "report_entity_742be900-6ade-4b7f-a811-0baba4124602" T4ON t4.record_id = np2.end_nodeJOIN "report_entity_a04fabdb-32d4-48fc-a27f-71c361aa8e84" T5ON t5.record_id = np3.end_nodeORDER BY t3.value, t4.value
The tables report_entity_
are partitions and the table record_link_report
stores the connections between the records.
One simple case is the node_path
cte returns the following:
root_node | end_node |
---|---|
1 | 2 |
1 | 3 |
1 | 4 |
The partitions have:
record_id | value |
---|---|
2 | Two |
3 | Three |
4 | Four |
The output is:
Two, Three, Four
The query works fine but is there any way to optimize this?
Sort (cost=8874130246090504192.00..8922528065712964608.00 rows=19359127848984059904 width=23) (actual time=2270.322..2289.420 rows=100000 loops=1) Sort Key: t3.value, t4.value Sort Method: quicksort Memory: 9631kB Buffers: shared hit=11085 CTE node_path -> Recursive Union (cost=3281.02..5140335.15 rows=91829438 width=64) (actual time=407.161..1045.247 rows=300000 loops=1) Buffers: shared hit=7851 -> Hash Join (cost=3281.02..13816.21 rows=300008 width=64) (actual time=407.159..624.422 rows=300000 loops=1) Hash Cond: (t2.source_record_id = t1.record_id) Buffers: shared hit=4441 -> Seq Scan on record_link_report t2 (cost=0.00..6410.08 rows=300008 width=32) (actual time=0.004..57.184 rows=300008 loops=1) Buffers: shared hit=3410 -> Hash (cost=2031.01..2031.01 rows=100001 width=16) (actual time=406.871..406.872 rows=100001 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 5712kB Buffers: shared hit=1031 -> Seq Scan on "report_entity_13aa9a01-1a1b-49dc-bfa4-b89a17e09d7c" t1 (cost=0.00..2031.01 rows=100001 width=16) (actual time=362.856..381.429 rows=100001 loops=1) Buffers: shared hit=1031 -> Hash Join (cost=10160.18..328993.02 rows=9152943 width=64) (actual time=297.311..297.313 rows=0 loops=1) Hash Cond: (np.path_array[array_upper(np.path_array, 1)] = t2_1.source_record_id) Join Filter: (t2_1.destination_record_id <> ALL (np.path_array)) Buffers: shared hit=3410 -> WorkTable Scan on node_path np (cost=0.00..60001.60 rows=3000080 width=48) (actual time=0.001..50.510 rows=300000 loops=1) -> Hash (cost=6410.08..6410.08 rows=300008 width=32) (actual time=141.968..141.969 rows=300008 loops=1) Buckets: 524288 Batches: 1 Memory Usage: 22847kB Buffers: shared hit=3410 -> Seq Scan on record_link_report t2_1 (cost=0.00..6410.08 rows=300008 width=32) (actual time=0.022..57.608 rows=300008 loops=1) Buffers: shared hit=3410 -> Merge Join (cost=51707965.01..290387550236042944.00 rows=19359127848984059904 width=23) (actual time=1910.945..2216.450 rows=100000 loops=1) Merge Cond: (np1.root_node = np2.root_node) Buffers: shared hit=11085 -> Sort (cost=17131318.17..17360891.77 rows=91829438 width=21) (actual time=1418.241..1443.674 rows=100000 loops=1) Sort Key: np1.root_node Sort Method: quicksort Memory: 9323kB Buffers: shared hit=8882 -> Hash Join (cost=3281.00..3102524.53 rows=91829438 width=21) (actual time=454.861..1383.738 rows=100000 loops=1) Hash Cond: (np1.end_node = t3.record_id) Buffers: shared hit=8882 -> CTE Scan on node_path np1 (cost=0.00..1836588.76 rows=91829438 width=32) (actual time=407.163..1214.671 rows=300000 loops=1) Buffers: shared hit=7851 -> Hash (cost=2031.00..2031.00 rows=100000 width=21) (actual time=47.422..47.423 rows=100000 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6200kB Buffers: shared hit=1031 -> Seq Scan on "report_entity_68378509-6a8c-49df-be4f-312489f765ba" t3 (cost=0.00..2031.00 rows=100000 width=21) (actual time=0.009..20.136 rows=100000 loops=1) Buffers: shared hit=1031 -> Materialize (cost=34576646.84..737891762664.76 rows=42163228416979 width=50) (actual time=492.679..703.209 rows=100000 loops=1) Buffers: shared hit=2203 -> Merge Join (cost=34576646.84..632483691622.31 rows=42163228416979 width=50) (actual time=492.676..653.190 rows=100000 loops=1) Merge Cond: (np2.root_node = np3.root_node) Buffers: shared hit=2203 -> Sort (cost=17445328.67..17674902.27 rows=91829438 width=31) (actual time=247.337..272.788 rows=100000 loops=1) Sort Key: np2.root_node Sort Method: quicksort Memory: 10414kB Buffers: shared hit=1172 -> Hash Join (cost=3422.00..3102665.53 rows=91829438 width=31) (actual time=49.675..213.451 rows=100000 loops=1) Hash Cond: (np2.end_node = t4.record_id) Buffers: shared hit=1172 -> CTE Scan on node_path np2 (cost=0.00..1836588.76 rows=91829438 width=32) (actual time=0.001..60.441 rows=300000 loops=1) -> Hash (cost=2172.00..2172.00 rows=100000 width=31) (actual time=49.404..49.405 rows=100000 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 7254kB Buffers: shared hit=1172 -> Seq Scan on "report_entity_742be900-6ade-4b7f-a811-0baba4124602" t4 (cost=0.00..2172.00 rows=100000 width=31) (actual time=0.013..20.451 rows=100000 loops=1) Buffers: shared hit=1172 -> Materialize (cost=17131318.17..17590465.36 rows=91829438 width=19) (actual time=245.298..312.802 rows=100000 loops=1) Buffers: shared hit=1031 -> Sort (cost=17131318.17..17360891.77 rows=91829438 width=19) (actual time=245.295..271.823 rows=100000 loops=1) Sort Key: np3.root_node Sort Method: quicksort Memory: 9323kB Buffers: shared hit=1031 -> Hash Join (cost=3281.00..3102524.53 rows=91829438 width=19) (actual time=48.970..210.833 rows=100000 loops=1) Hash Cond: (np3.end_node = t5.record_id) Buffers: shared hit=1031 -> CTE Scan on node_path np3 (cost=0.00..1836588.76 rows=91829438 width=32) (actual time=0.001..60.433 rows=300000 loops=1) -> Hash (cost=2031.00..2031.00 rows=100000 width=19) (actual time=48.697..48.698 rows=100000 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6092kB Buffers: shared hit=1031 -> Seq Scan on "report_entity_a04fabdb-32d4-48fc-a27f-71c361aa8e84" t5 (cost=0.00..2031.00 rows=100000 width=19) (actual time=0.014..19.910 rows=100000 loops=1) Buffers: shared hit=1031Planning: Buffers: shared hit=16Planning Time: 0.583 msJIT: Functions: 68 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 3.347 ms, Inlining 9.956 ms, Optimization 216.984 ms, Emission 136.095 ms, Total 366.382 msExecution Time: 2339.415 ms
CREATE TABLE record_link_report ( tenant_id uuid NOT NULL, source_record_id uuid NOT NULL, destination_record_id uuid NOT NULL, CONSTRAINT record_link_report_pkey PRIMARY KEY (tenant_id, source_record_id, destination_record_id));
CREATE TABLE "report_entity" ( tenant_id uuid NOT NULL, attribute_id uuid NOT NULL, record_id uuid NOT NULL, record_id uuid NOT NULL, entity_id uuid NOT NULL, value text NULL, CONSTRAINT "report_entity_pk" PRIMARY KEY (tenant_id, attribute_id, record_id))PARTITION BY LIST (entity_id););
Postgres 15