Quantcast
Viewing all articles
Browse latest Browse all 207

For each category, find the count of foreign-key items in all child categories using a PostgreSQL Recursive CTE

I have a typical tree structure stored as an adjacency list in PostgreSQL 9.4:

gear_category (   id INTEGER PRIMARY KEY,   name TEXT,   parent_id INTEGER   );

As well as a list of items attached to the categories:

gear_item (   id INTEGER PRIMARY KEY,   name TEXT,   category_id INTEGER REFERENCES gear_category);

Any category can have attached gear items, not just the leaves.

For speed reasons, I want to pre-calculate some data about each category, which I'll use to generate a Materialized View.

Desired output:

speedy_materialized_view (   gear_category_id INTEGER,   count_direct_child_items INTEGER,   count_recursive_child_items INTEGER);

count_recursive_child_items is the cumulative number of GearItems attached to the current category or any child categories. There should be one row for each category, with a 0 for any counts that are 0.

In order to calculate this, we need to use a recursive CTE to traverse the tree:

WITH RECURSIVE children(id, parent_id) AS (    --base case    SELECT gear_category.id AS id, gear_category.parent_id AS parent_id    FROM gear_category    WHERE gear_category.id = 37  -- setting this to id includes current object                             -- setting to parent_id includes only children    --combine with recursive part    UNION ALL     SELECT gear_category.id AS gear_category_id         , gear_category.parent_id AS gear_category_parent_id    FROM gear_category, children    WHERE children.id = gear_category.parent_id)TABLE children;

It's simple to count the child gear items attached to this list of child categories:

--Subselect variantSELECT count(gear_item.id) AS count_recursive_child_items_for_single_catFROM gear_item WHERE gear_item.category_id IN (SELECT children.id AS children_idFROM children);-- JOIN variantSELECT count(gear_item.id) AS count_recursive_child_items_for_single_catFROM gear_item, childrenWHERE gear_item.category_id = children.id;

But if you look at the CTE, I've hardcoded the starting category ID of '37'. I can't figure out how to combine these queries to generate the count_recursive_child_items for all categories, not just a single one.

How do I combine these?

Also, currently for each category, I calculate all the child categories, which creates a lot of duplicated work, and I'm not sure how to remove that. For example, say I have Grandparent > Parent > Leaf. Currently I separately calculate child categories for Grandparent and Parent, which means I twice calculate the Parent > Leaf relationship.

And since I'm already returning count_direct_child_items for each category, it might be faster to just use those when calculating count_recursive_child_items rather than starting from scratch in my counts like I currently do.

Separately, each of these concepts makes sense to me. I just can't figure out how to combine them into a single elegant/optimized query.


Viewing all articles
Browse latest Browse all 207

Trending Articles