I have following schema in PostgreSQL 12 (objects and their relations are versioned by date ranges):
CREATE TABLE tmp_deps ( id bigint, code text, name text, start_date date, end_date date);CREATE TABLE tmp_dep_parents ( department_id bigint, parent_id bigint, start_date date, end_date date);INSERT INTO tmp_deps (id, code, name, start_date, end_date)VALUES (1, '11111111', 'Root Dep', '1970-01-01', '9999-12-31'), (2, '11111112', 'Sub Root Dep', '1970-01-01', '9999-12-31'), (3, '11111113', 'Dep 1', '1970-01-01', '2021-02-23'), (3, '11111113', 'Dep 1', '2021-02-24', '9999-12-31'), (4, '11111114', 'Dep 2', '1970-01-01', '2020-01-01'), (4, '11111114', 'Dep 2', '2020-01-02', '9999-12-31'), (5, '11111115', 'Sub Dep 1', '1970-01-01', '9999-12-31'), (6, '11111116', 'Sub Dep 2', '1970-01-01', '9999-12-31');INSERT INTO tmp_dep_parents (department_id, parent_id, start_date, end_date)VALUES (2, 1, '1970-01-01', '9999-12-31'), (3, 2, '1970-01-01', '2005-12-31'), (3, 1, '2006-01-01', '2006-12-31'), (3, 2, '2007-01-01', '9999-12-31'), (4, 2, '1970-01-01', '2010-05-01'), (4, 3, '2010-05-02', '2010-05-03'), (4, 2, '2010-05-04', '9999-12-31'), (5, 3, '1970-01-01', '9999-12-31'), (6, 4, '1970-01-01', '9999-12-31');
objects and their relations are versioned by date ranges
It means that object with id=1 can be named as "Root Department" from 1 Jan 2000 to 31 Dec 2021, and can be named as "Super Department" from 1 Jan 2022 to 31 Dec 9999.
Object with id=3 can belong to object with id=2 from 1 Jan 1970 to 15 May 2003 and can belong to object with id=1 from 16 May 2003 to 31 Dec 9999
The hierarchy of objects can be changed at any time.I am using recursive CTE to create MATERIALIZED VIEW and I noticed duplicates when I filter objects and their relations by date. And also I need an ability to view historical data too that is the reason why I can't just filter objects and their relations by date inside CTE.My CTE is:
WITH RECURSIVE deps AS ( SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, 1::int AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep WHERE dep.code = '11111111' UNION SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, s.depth + 1 AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep INNER JOIN deps s ON dep.parent_id = s.id),all_deps AS ( SELECT dep.*, dp.parent_id, dp.start_date AS dep_par_start_date, dp.end_date AS dep_par_end_date FROM tmp_deps dep LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id)SELECT * FROM deps
The problem with duplicates occurs when I add the "depth" output in CTE.And as I understand it correctly, the duplicates are hidden when I remove the "depth" output from CTE, but I can still get wrong data.My CTE contains all versions of objects and their relations.And I don't understand how to get the CTE to follow the correct hierarchy path.
I want to see the hierarchy of departments and the departments themselves as of 15 May 2020 or 9 June 2021 or any other date, that is why I put all data states (versions) to CTE.
Materialized view is created from RCTE:
CREATE MATERIALIZED VIEW departments_mat_view ASWITH RECURSIVE deps AS ( SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, 1::int AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep WHERE dep.code = '11111111' UNION SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, s.depth + 1 AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep INNER JOIN deps s ON dep.parent_id = s.id),all_deps AS ( SELECT dep.*, dp.parent_id, dp.start_date AS dep_par_start_date, dp.end_date AS dep_par_end_date FROM tmp_deps dep LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id)SELECT * FROM deps
And here is select from MATERIALIZED VIEW:
SELECT * FROM departments_mat_viewWHERE dep_start_date <= '2021-06-09' AND dep_end_date >= '2021-06-09' AND (dep_par_start_date IS NULL OR dep_par_start_date <= '2021-06-09') AND (dep_par_end_date IS NULL OR dep_par_end_date >= '2021-06-09')
Here is output from CTE:
"id" "dep_start_date" "dep_end_date" "name" "code" "depth" "parent_id" "dep_par_start_date" "dep_par_end_date"1 "1970-01-01" "9999-12-31" "Root Dep" 11111111 1 NULL NULL NULL2 "1970-01-01" "9999-12-31" "Sub Root Dep" 11111112 2 1 "1970-01-01" "9999-12-31"3 "2021-02-24" "9999-12-31" "Dep 1" 11111113 3 2 "2007-01-01" "9999-12-31"4 "2020-01-02" "9999-12-31" "Dep 2" 11111114 3 2 "2010-05-04" "9999-12-31"5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 3 3 "1970-01-01" "9999-12-31"5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 4 3 "1970-01-01" "9999-12-31"6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 4 4 "1970-01-01" "9999-12-31"6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 5 4 "1970-01-01" "9999-12-31"
Desired output is:
"id" "dep_start_date" "dep_end_date" "name" "code" "depth" "parent_id" "dep_par_start_date" "dep_par_end_date"1 "1970-01-01" "9999-12-31" "Root Dep" 11111111 1 NULL NULL NULL2 "1970-01-01" "9999-12-31" "Sub Root Dep" 11111112 2 1 "1970-01-01" "9999-12-31"3 "2021-02-24" "9999-12-31" "Dep 1" 11111113 3 2 "2007-01-01" "9999-12-31"4 "2020-01-02" "9999-12-31" "Dep 2" 11111114 3 2 "2010-05-04" "9999-12-31"5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 4 3 "1970-01-01" "9999-12-31"6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 4 4 "1970-01-01" "9999-12-31"