I have a master table of versioned rows:
CREATE TABLE master ( id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date[], valid_during daterange);INSERT INTO master (rec_id, val, valid_on, valid_during) VALUES (1, 'a', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'), (2, 'b', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'), (3, 'c', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)');SELECT * FROM master ORDER BY rec_id, id;/* id | rec_id | val | valid_on | valid_during ----+--------+-----+-------------------------+----------------------- 1 | 1 | a | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,infinity)*/
The rec_id
is a the record's natural key, the valid_on
is an array of dates on which the record was valid, and the valid_during
is a date range describing the interval during which the record is valid. (The upper bound on the valid_during is 'infinity' if there is no record with the same rec_id
with a more recent valid_on
value.)
Given a second table of updated records, along with new dates on which each record was valid:
CREATE TABLE updates (id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date); INSERT INTO updates (rec_id, val, valid_on) VALUES(1, 'a', '2015-01-03'), -- (1) same "val" for id 1, just add valid_on date(2, 'd', '2015-01-06'), -- (2) different val for id 2,(3, 'e', '2015-01-03'); -- (3) different val for id 3 with new date -- intersecting old date rangeSELECT * FROM updates;/* id | rec_id | val | valid_on ----+--------+-----+------------ 1 | 1 | a | 2015-01-03 2 | 2 | d | 2015-01-06 3 | 3 | e | 2015-01-03*/
I would like to insert/update the master table to wind up with something like this:
-- The goalSELECT rec_id, val, valid_on, valid_during FROM master ORDER BY rec_id, id;/* rec_id | val | valid_on | valid_during --------+-----+------------------------------------+----------------------- 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,2015-01-06) 2 | d | {2015-01-06} | [2015-01-06,infinity) 3 | c | {2015-01-01} | [2015-01-01,2015-01-03) 3 | e | {2015-01-03} | [2015-01-03,2015-01-05) 3 | c | {2015-01-05} | [2015-01-05,infinity)*/
Specifically:
- If a new record's
rec_id
exists in the master table with the sameval
, but the newvalid_on
date is not in thevalid_on
array in the master, simply add the new date to the master table'svalid_on
field (seerec_id
1) - If a new record's
rec_id
exists with a differentval
, insert the new record into the master table. The old record in the master table should have itsvalid_during
value end on the date of the new record'svalid_on
(seerec_id
2) - If the new record's
valid_on
date intersects the old record'svalid_during
range, the old record should appear on both "sides" of the updated record (seerec_id
3)
I can get most of the way there. The first case is straightforward: we just need to update the valid_on
field in the master table (we'll worry about the valid_during
field momentarily in a separate step):
UPDATE master mSET valid_on = m.valid_on || u.valid_onFROM updates uWHERE m.rec_id = u.rec_id AND m.val = u.val AND NOT m.valid_on @> ARRAY[u.valid_on];SELECT * FROM master ORDER BY rec_id, id;/* id | rec_id | val | valid_on | valid_during ----+--------+-----+------------------------------------+----------------------- 1 | 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,infinity)*/
For case #2, we can do a simple insert:
INSERT INTO master (rec_id, val, valid_on)SELECT u.rec_id, u.val, ARRAY[u.valid_on]FROM updates u LEFT JOIN master m ON u.rec_id = m.rec_id AND u.val = m.valWHERE m.id IS NULL;SELECT * FROM master ORDER BY rec_id, id;/* id | rec_id | val | valid_on | valid_during ----+--------+-----+------------------------------------+----------------------- 1 | 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 4 | 2 | d | {2015-01-06} | 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 5 | 3 | e | {2015-01-03} |*/
Now, we can correct the valid_during
range in one pass by joining on a subquery which uses a window function that checks for the next valid date for a record with the same rec_id
:
-- Helper function...CREATE OR REPLACE FUNCTION arraymin(anyarray) RETURNS anyelement AS $$ SELECT min($1[i]) FROM generate_series(array_lower($1,1), array_upper($1,1)) g(i); $$ language sql immutable strict; UPDATE master mSET valid_during = daterange(arraymin(valid_on), new_valid_until)FROM ( SELECT id, lead(arraymin(valid_on), 1, 'infinity'::date) OVER (partition by rec_id ORDER BY arraymin(valid_on)) AS new_valid_until FROM master ) tWHERE m.id = t.id;SELECT * FROM master ORDER BY rec_id, id;/* id | rec_id | val | valid_on | valid_during ----+--------+-----+------------------------------------+------------------------- 1 | 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,2015-01-06) 4 | 2 | d | {2015-01-06} | [2015-01-06,infinity) 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,2015-01-03) 5 | 3 | e | {2015-01-03} | [2015-01-03,infinity)*/
And here's where I'm stuck: rec_id
1 and 2 are exactly what I want, but rec_id
3 needs to be inserted again to appear valid on '2015-01-05'. I can't seem to wrap my head around the array operation to perform that insert. Any thoughts on approaches that don't involve unnesting the master table? Or is that the only/best approach here?
I'm using PostgreSQL 9.3 (but would happily upgrade to 9.4 if there's a graceful way to do this in the newer version).