tl;dr

Why does an update from ... statement make a CTE behave differently than an update where... with subquery statement in postgresql?

Full Context

Disclaimer: The information below has been sanitized to maintain privacy.

While working on a legacy system (about 5 year old design) I have a postgres documents table that looks like this:

create table documents (    uuid              uuid                     not null primary key,    data              jsonb                    not null,    created_dt        timestamp with time zone not null,    deleted_dt        timestamp with time zone);

And the information in the data column looks like this roughly:

{"uuid": "959bd856-707a-4c5e-a76e-4c1496a6103a","bucket": "fake_cloud_bucket_01","md5Sum": "233ea5afd52636b83cf75f7fd39c1f2a","contentType": "application/pdf","tags": {"APPLICATION_UUID": "3e979827-36df-4bb0-9012-22cceb54d5e9","IDENTITY_UUID": "9e137538-b0ad-4322-a6a4-2047bab984e4"    },"createdDate": 1704757787264}

The application that uses this database has a search by tag feature, so in the original design, an index was added to the documents table like this:

create index documents_tags_idxgin    on documents using gin ((data -> 'tags'::text) jsonb_path_ops);

Now fast-forward 5 years to today and the documents table has some 20 million rows and a simple query like the one below takes about 3 minutes to finish, which is unacceptable:

select * from documentswhere data->'tags'->>'key' = 'IDENTITY_UUID'  and data->'tags'->>'value' = '9e137538-b0ad-4322-a6a4-2047bab984e4';

To make the search by tag feature perform better, I decided to extract the tags into its own table and apply more performant indices to the data. So now we have a new table called tags and an index for it that looks like this:

create table tags (    document_uuid uuid not null references documents(uuid) on delete restrict,    tag text not null check (tag <> ''),    value text not null check (value <> ''),    primary key (document_uuid, tag));create index tags_tag_value_idx on tags (tag, value);

With the above addition of the tags table, now this query takes on average single digit milliseconds to return a result:

select D.*from documents as Djoin tags as T on D.uuid = T.document_uuidwhere T.tag = 'IDENTITY_UUID'  and T.value = '9e137538-b0ad-4322-a6a4-2047bab984e4';

The Question

To convert the data from the jsonbtags object in the data column of the documents table to rows in the tags table, I wrote a query that spreads the contents of the tags object for each document into the data to be received as a row in the tags table:

insert into tags (document_uuid, tag, value)    select D.uuid, T.key, replace((T.value)::text, '"', '')    from documents as D        join jsonb_each(data->'tags') as T on true    where D.tags_processed_on is null        for update skip locked    limit 2500        on conflict do nothing    returning document_uuid, tag, value

Given the jsonb object in the context part above, there are two rows insert into the tags table and the rows returned by the above query look like this:

document_uuid, tag, value'959bd856-707a-4c5e-a76e-4c1496a6103a', 'APPLICATION_UUID', '3e979827-36df-4bb0-9012-22cceb54d5e9''959bd856-707a-4c5e-a76e-4c1496a6103a', 'IDENTITY_UUID', '9e137538-b0ad-4322-a6a4-2047bab984e4'

In order to keep track of the documents processed, we also added a temporary column to the documents table:

alter table documents add column tags_processed_on timestamp default null;

So my query to insert into the tags table and update documents as processed looked like this:

with docs_to_tags as (    insert into tags (document_uuid, tag, value)        select D.uuid, T.key, replace((T.value)::text, '"', '')        from documents as D            join jsonb_each(data->'tags') as T on true        where D.tags_processed_on is null            for update skip locked        limit 2500            on conflict do nothing        returning document_uuid, tag, value)update documents as D1set tags_processed_on = now()from docs_to_tags as DTTwhere D1.uuid = DTT.document_uuid;

The problem with the above CTE is that when I ran it, it would only insert the first tag for any given document and move on to the next document. But when I used a subquery with the outer update statement, then the insert into the tags table behaved like I expected, inserting all tags into the tags table. So the CTE that works looks like this:

with docs_to_tags as (    insert into tags (document_uuid, tag, value)        select D.uuid, T.key, replace((T.value)::text, '"', '')        from documents as D            join jsonb_each(data->'tags') as T on true        where D.tags_processed_on is null            for update skip locked        limit 2500            on conflict do nothing        returning document_uuid, tag, value)update documents as D1set tags_processed_on = now()where D1.uuid in (    select distinct D2.document_uuid    from docs_to_tags as D2);

So finally my question is this:

Why does an update from ... statement make a CTE behave differently than an update where... with subquery statement in postgresql?

... or more succinctly: Why does the outer update statement have any influence on what is returned by the inner insert query inside of the CTE?

PostgreSQL CTE behavior influenced by outer `update` statement: Why?

tl;dr

Full Context

The Question

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112