Quantcast
Channel: Active questions tagged cte - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 207

Optimizing a CTE hierarchy

$
0
0

Update below

I have a table of accounts with a typical acct/parent account architecture to represent a hierarchy of accounts (SQL Server 2012). I created a VIEW using a CTE to hash out the hierarchy, and on the whole it works beautifully, and as intended. I can query the hierarchy at any level, and see the branches easily.

There is one business logic field that needs to be returned as a function of the hierarchy. A field in each account record describes the size of the business (we'll call it CustomerCount). The logic I need to report on needs to roll up the CustomerCount from the whole branch. In other words, given an account, I need to sum up the customercount values for that account along with every child in every branch below the account along the hierarchy.

I successfully calculated the field using a hierarchy field built within the CTE, which looks like acct4.acct3.acct2.acct1. The problem I'm running into is simply making it run fast. Without this one calculated field, the query runs in ~3 seconds. When I add in the calculated field, it turns into a 4 minute query.

Here is the best version I've been able to come up with that returns the correct results. I'm looking for ideas on how I can restructure this AS A VIEW without such huge sacrifices to performance.

I understand the reason this one goes slow (requires calculating a predicate in the where clause), but I can't think of another way to structure it and still get same results.

Here's some sample code to build a table and do the CTE pretty much exactly as it works in my environment.

Use TempdbgoCREATE TABLE dbo.Account(   Acctid varchar(1) NOT NULL    , Name varchar(30) NULL    , ParentId varchar(1) NULL    , CustomerCount int NULL);INSERT AccountSELECT 'A','Best Bet',NULL,21  UNION ALLSELECT 'B','eStore','A',30 UNION ALLSELECT 'C','Big Bens','B',75 UNION ALLSELECT 'D','Mr. Jimbo','B',50 UNION ALLSELECT 'E','Dr. John','C',100 UNION ALLSELECT 'F','Brick','A',222 UNION ALLSELECT 'G','Mortar','C',153 ;With AccountHierarchy AS(                                                                           --Root values have no parent    SELECT        Root.AcctId                                         AccountId        , Root.Name                                         AccountName        , Root.ParentId                                     ParentId        , 1                                                 HierarchyLevel          , cast(Root.Acctid as varchar(4000))                IdHierarchy     --highest parent reads right to left as in id3.Acctid2.Acctid1        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step)        , cast(Root.Acctid as varchar(4000))                HierarchySort   --reverse of above, read left to right name1.name2.name3 for sorting on reporting only        , cast(Root.Name as varchar(4000))                  HierarchyLabel  --use for labels on reporting only, indents names under sorted hierarchy        , Root.CustomerCount                                CustomerCount       FROM         tempdb.dbo.account Root    WHERE        Root.ParentID is null    UNION ALL    SELECT        Recurse.Acctid                                      AccountId        , Recurse.Name                                      AccountName        , Recurse.ParentId                                  ParentId        , Root.HierarchyLevel + 1                           HierarchyLevel  --next level in hierarchy        , cast(cast(recurse.Acctid as varchar(40)) +'.'+ Root.IdHierarchy as varchar(4000))   IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting        , cast(replace(recurse.Name,'.','') +'.'+ Root.NameHierarchy as varchar(4000)) NameHierarchy  --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy        , cast(Root.AccountName +'.'+ Recurse.Name as varchar(4000)) HierarchySort            , cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel        , Recurse.CustomerCount                             CustomerCount    FROM        tempdb.dbo.account Recurse INNER JOIN        AccountHierarchy Root on Root.AccountId = Recurse.ParentId)SELECT    hier.AccountId    , Hier.AccountName    , hier.ParentId    , hier.HierarchyLevel    , hier.IdHierarchy    , hier.NameHierarchy    , hier.HierarchyLabel    , parsename(hier.IdHierarchy,1) Acct1Id    , parsename(hier.NameHierarchy,1) Acct1Name     --This is why we stripped out '.' during recursion    , parsename(hier.IdHierarchy,2) Acct2Id    , parsename(hier.NameHierarchy,2) Acct2Name    , parsename(hier.IdHierarchy,3) Acct3Id    , parsename(hier.NameHierarchy,3) Acct3Name    , parsename(hier.IdHierarchy,4) Acct4Id    , parsename(hier.NameHierarchy,4) Acct4Name    , hier.CustomerCount    /* fantastic up to this point. Next block of code is what causes problem.         Logic of code is "sum of CustomerCount for this location and all branches below in this branch of hierarchy"        In live environment, goes from taking 3 seconds to 4 minutes by adding this one calc */    , (        SELECT              sum(children.CustomerCount)        FROM            AccountHierarchy Children        WHERE            hier.IdHierarchy = right(children.IdHierarchy, (1 /*length of id field*/ * hier.HierarchyLevel) + hier.HierarchyLevel - 1 /*for periods inbetween ids*/)            --"where this location's idhierarchy is within child idhierarchy"            --previously tried a charindex(hier.IdHierarchy,children.IdHierarchy)>0, but that performed even worse        ) TotalCustomerCountFROM    AccountHierarchy hierORDER BY    hier.HierarchySortdrop table tempdb.dbo.Account

11/20/2013 UPDATE

Some of the suggested solutions got my juices flowing, and I tried a new approach that comes close, but introduces a new/different obstacle. Honestly, I don't know if this warrants a separate post or not, but it's related to the solution of this problem.

What I decided was that what was making the sum(customercount) difficult is the identification of children in the context of a hierarchy that starts at the top and builds down. So I started by creating a hierarchy that builds from the bottom up, using the root defined by "accounts that are not parent to any other account" and doing the recursive join backwards (root.parentacctid = recurse.acctid)

This way I could just add the child customer count to the parent as the recursion happens. Because of how I need reporting, and levels, I am doing this bottom up cte in addition to the top down, then just joining them via account id. This approach turns out to be much faster than the original outer query customercount, but I ran into a few obstacles.

First, I was inadvertently capturing duplicative customer count for accounts that are parent to multiple children. I was double or triple counting customer count for some acctid's, by the number of children there were. My solution was to create yet another cte which counts how many nodes an acct has, and divide the acct.customercount during recursion, so when I add up the whole branch the acct is not being double counted.

So at this point, the results of this new version are not correct, but I know why. The bottomup cte is creating duplicates. When the recursion passes, it looks for anything in the root (bottom level children) that is child to an account in the account table. On the third recursion, it picks up the same accounts it did in the second and puts them in again.

Ideas on how to do a bottom up cte, or does this get any other ideas flowing?

Use TempdbgoCREATE TABLE dbo.Account(    Acctid varchar(1) NOT NULL    , Name varchar(30) NULL    , ParentId varchar(1) NULL    , CustomerCount int NULL);INSERT AccountSELECT 'A','Best Bet',NULL,1  UNION ALLSELECT 'B','eStore','A',2 UNION ALLSELECT 'C','Big Bens','B',3 UNION ALLSELECT 'D','Mr. Jimbo','B',4 UNION ALLSELECT 'E','Dr. John','C',5 UNION ALLSELECT 'F','Brick','A',6 UNION ALLSELECT 'G','Mortar','C',7 ;With AccountHierarchy AS(                                                                           --Root values have no parent    SELECT        Root.AcctId                                         AccountId        , Root.Name                                         AccountName        , Root.ParentId                                     ParentId        , 1                                                 HierarchyLevel          , cast(Root.Acctid as varchar(4000))                IdHierarchy     --highest parent reads right to left as in id3.Acctid2.Acctid1        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step)        , cast(Root.Acctid as varchar(4000))                HierarchySort   --reverse of above, read left to right name1.name2.name3 for sorting on reporting only        , cast(Root.Acctid as varchar(4000))                HierarchyMatch         , cast(Root.Name as varchar(4000))                  HierarchyLabel  --use for labels on reporting only, indents names under sorted hierarchy        , Root.CustomerCount                                CustomerCount       FROM         tempdb.dbo.account Root    WHERE        Root.ParentID is null    UNION ALL    SELECT        Recurse.Acctid                                      AccountId        , Recurse.Name                                      AccountName        , Recurse.ParentId                                  ParentId        , Root.HierarchyLevel + 1                           HierarchyLevel  --next level in hierarchy        , cast(cast(recurse.Acctid as varchar(40)) +'.'+ Root.IdHierarchy as varchar(4000))   IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting        , cast(replace(recurse.Name,'.','') +'.'+ Root.NameHierarchy as varchar(4000)) NameHierarchy  --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy        , cast(Root.AccountName +'.'+ Recurse.Name as varchar(4000)) HierarchySort            , CAST(CAST(Root.HierarchyMatch as varchar(40)) +'.'+ cast(recurse.Acctid as varchar(40))   as varchar(4000))   HierarchyMatch        , cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel        , Recurse.CustomerCount                             CustomerCount    FROM        tempdb.dbo.account Recurse INNER JOIN        AccountHierarchy Root on Root.AccountId = Recurse.ParentId), Nodes as(   --counts how many branches are below for any account that is parent to another    select        node.ParentId Acctid        , cast(count(1) as float) Nodes    from AccountHierarchy  node    group by ParentId), BottomUp as(   --creates the hierarchy starting at accounts that are not parent to any other    select        Root.Acctid        , root.ParentId        , cast(isnull(root.customercount,0) as float) CustomerCount    from        tempdb.dbo.Account Root    where        not exists ( select 1 from tempdb.dbo.Account OtherAccts where root.Acctid = OtherAccts.ParentId)    union all    select        Recurse.Acctid        , Recurse.ParentId        , root.CustomerCount + cast ((isnull(recurse.customercount,0) / nodes.nodes) as float) CustomerCount        -- divide the recurse customercount by number of nodes to prevent duplicate customer count on accts that are parent to multiple children, see customercount cte next    from        tempdb.dbo.Account Recurse inner join         BottomUp Root on root.ParentId = recurse.acctid inner join        Nodes on nodes.Acctid = recurse.Acctid), CustomerCount as(    select        sum(CustomerCount) TotalCustomerCount        , hier.acctid    from        BottomUp hier    group by         hier.Acctid)SELECT    hier.AccountId    , Hier.AccountName    , hier.ParentId    , hier.HierarchyLevel    , hier.IdHierarchy    , hier.NameHierarchy    , hier.HierarchyLabel    , hier.hierarchymatch    , parsename(hier.IdHierarchy,1) Acct1Id    , parsename(hier.NameHierarchy,1) Acct1Name     --This is why we stripped out '.' during recursion    , parsename(hier.IdHierarchy,2) Acct2Id    , parsename(hier.NameHierarchy,2) Acct2Name    , parsename(hier.IdHierarchy,3) Acct3Id    , parsename(hier.NameHierarchy,3) Acct3Name    , parsename(hier.IdHierarchy,4) Acct4Id    , parsename(hier.NameHierarchy,4) Acct4Name    , hier.CustomerCount    , customercount.TotalCustomerCountFROM    AccountHierarchy hier inner join    CustomerCount on customercount.acctid = hier.accountidORDER BY    hier.HierarchySort drop table tempdb.dbo.Account

Viewing all articles
Browse latest Browse all 207

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>