Quantcast
Viewing all articles
Browse latest Browse all 207

Updating table of versioned rows with historical records in PostgreSQL

I have a master table of versioned rows:

CREATE TABLE master (    id SERIAL PRIMARY KEY,    rec_id integer,     val text,     valid_on date[],     valid_during daterange);INSERT INTO master (rec_id, val, valid_on, valid_during) VALUES    (1, 'a', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'),    (2, 'b', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'),    (3, 'c', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)');SELECT * FROM master ORDER BY rec_id, id;/*     id | rec_id | val |        valid_on         |     valid_during    ----+--------+-----+-------------------------+-----------------------      1 |      1 | a   | {2015-01-01,2015-01-05} | [2015-01-01,infinity)      2 |      2 | b   | {2015-01-01,2015-01-05} | [2015-01-01,infinity)      3 |      3 | c   | {2015-01-01,2015-01-05} | [2015-01-01,infinity)*/

The rec_id is a the record's natural key, the valid_on is an array of dates on which the record was valid, and the valid_during is a date range describing the interval during which the record is valid. (The upper bound on the valid_during is 'infinity' if there is no record with the same rec_id with a more recent valid_on value.)

Given a second table of updated records, along with new dates on which each record was valid:

CREATE TABLE updates (id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date); INSERT INTO updates (rec_id, val, valid_on) VALUES(1, 'a', '2015-01-03'), -- (1) same "val" for id 1, just add valid_on date(2, 'd', '2015-01-06'), -- (2) different val for id 2,(3, 'e', '2015-01-03'); -- (3) different val for id 3 with new date                         --     intersecting old date rangeSELECT * FROM updates;/*     id | rec_id | val |  valid_on    ----+--------+-----+------------      1 |      1 | a   | 2015-01-03      2 |      2 | d   | 2015-01-06      3 |      3 | e   | 2015-01-03*/

I would like to insert/update the master table to wind up with something like this:

-- The goalSELECT rec_id, val, valid_on, valid_during FROM master ORDER BY rec_id, id;/*     rec_id | val |        valid_on                    |     valid_during    --------+-----+------------------------------------+-----------------------      1     | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)      2     | b   | {2015-01-01,2015-01-05}            | [2015-01-01,2015-01-06)      2     | d   | {2015-01-06}                       | [2015-01-06,infinity)      3     | c   | {2015-01-01}                       | [2015-01-01,2015-01-03)      3     | e   | {2015-01-03}                       | [2015-01-03,2015-01-05)      3     | c   | {2015-01-05}                       | [2015-01-05,infinity)*/

Specifically:

  • If a new record's rec_id exists in the master table with the same val, but the new valid_on date is not in the valid_on array in the master, simply add the new date to the master table's valid_on field (see rec_id 1)
  • If a new record's rec_id exists with a different val, insert the new record into the master table. The old record in the master table should have its valid_during value end on the date of the new record's valid_on (see rec_id 2)
  • If the new record's valid_on date intersects the old record's valid_during range, the old record should appear on both "sides" of the updated record (see rec_id 3)

I can get most of the way there. The first case is straightforward: we just need to update the valid_on field in the master table (we'll worry about the valid_during field momentarily in a separate step):

UPDATE master mSET valid_on = m.valid_on || u.valid_onFROM updates uWHERE m.rec_id = u.rec_id     AND m.val = u.val     AND NOT m.valid_on @> ARRAY[u.valid_on];SELECT * FROM master ORDER BY rec_id, id;/*     id | rec_id | val |              valid_on              |     valid_during    ----+--------+-----+------------------------------------+-----------------------      1 |      1 | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)      2 |      2 | b   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)      3 |      3 | c   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)*/

For case #2, we can do a simple insert:

INSERT INTO master (rec_id, val, valid_on)SELECT u.rec_id, u.val, ARRAY[u.valid_on]FROM updates u     LEFT JOIN master m ON u.rec_id = m.rec_id AND u.val = m.valWHERE m.id IS NULL;SELECT * FROM master ORDER BY rec_id, id;/*     id | rec_id | val |              valid_on              |     valid_during    ----+--------+-----+------------------------------------+-----------------------      1 |      1 | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)      2 |      2 | b   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)      4 |      2 | d   | {2015-01-06}                       |      3 |      3 | c   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)      5 |      3 | e   | {2015-01-03}                       |*/

Now, we can correct the valid_during range in one pass by joining on a subquery which uses a window function that checks for the next valid date for a record with the same rec_id:

-- Helper function...CREATE OR REPLACE FUNCTION arraymin(anyarray) RETURNS anyelement AS $$     SELECT min($1[i])     FROM generate_series(array_lower($1,1), array_upper($1,1)) g(i); $$ language sql immutable strict; UPDATE master mSET valid_during = daterange(arraymin(valid_on), new_valid_until)FROM (    SELECT        id,        lead(arraymin(valid_on), 1, 'infinity'::date)        OVER (partition by rec_id ORDER BY arraymin(valid_on)) AS new_valid_until    FROM master ) tWHERE    m.id = t.id;SELECT * FROM master ORDER BY rec_id, id;/*     id | rec_id | val |              valid_on              |      valid_during    ----+--------+-----+------------------------------------+-------------------------      1 |      1 | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)      2 |      2 | b   | {2015-01-01,2015-01-05}            | [2015-01-01,2015-01-06)      4 |      2 | d   | {2015-01-06}                       | [2015-01-06,infinity)      3 |      3 | c   | {2015-01-01,2015-01-05}            | [2015-01-01,2015-01-03)      5 |      3 | e   | {2015-01-03}                       | [2015-01-03,infinity)*/

And here's where I'm stuck: rec_id 1 and 2 are exactly what I want, but rec_id 3 needs to be inserted again to appear valid on '2015-01-05'. I can't seem to wrap my head around the array operation to perform that insert. Any thoughts on approaches that don't involve unnesting the master table? Or is that the only/best approach here?

I'm using PostgreSQL 9.3 (but would happily upgrade to 9.4 if there's a graceful way to do this in the newer version).


Viewing all articles
Browse latest Browse all 207

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>