Update below
I have a table of accounts with a typical acct/parent account architecture to represent a hierarchy of accounts (SQL Server 2012). I created a VIEW using a CTE to hash out the hierarchy, and on the whole it works beautifully, and as intended. I can query the hierarchy at any level, and see the branches easily.
There is one business logic field that needs to be returned as a function of the hierarchy. A field in each account record describes the size of the business (we'll call it CustomerCount). The logic I need to report on needs to roll up the CustomerCount from the whole branch. In other words, given an account, I need to sum up the customercount values for that account along with every child in every branch below the account along the hierarchy.
I successfully calculated the field using a hierarchy field built within the CTE, which looks like acct4.acct3.acct2.acct1. The problem I'm running into is simply making it run fast. Without this one calculated field, the query runs in ~3 seconds. When I add in the calculated field, it turns into a 4 minute query.
Here is the best version I've been able to come up with that returns the correct results. I'm looking for ideas on how I can restructure this AS A VIEW without such huge sacrifices to performance.
I understand the reason this one goes slow (requires calculating a predicate in the where clause), but I can't think of another way to structure it and still get same results.
Here's some sample code to build a table and do the CTE pretty much exactly as it works in my environment.
Use TempdbgoCREATE TABLE dbo.Account( Acctid varchar(1) NOT NULL , Name varchar(30) NULL , ParentId varchar(1) NULL , CustomerCount int NULL);INSERT AccountSELECT 'A','Best Bet',NULL,21 UNION ALLSELECT 'B','eStore','A',30 UNION ALLSELECT 'C','Big Bens','B',75 UNION ALLSELECT 'D','Mr. Jimbo','B',50 UNION ALLSELECT 'E','Dr. John','C',100 UNION ALLSELECT 'F','Brick','A',222 UNION ALLSELECT 'G','Mortar','C',153 ;With AccountHierarchy AS( --Root values have no parent SELECT Root.AcctId AccountId , Root.Name AccountName , Root.ParentId ParentId , 1 HierarchyLevel , cast(Root.Acctid as varchar(4000)) IdHierarchy --highest parent reads right to left as in id3.Acctid2.Acctid1 , cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step) , cast(Root.Acctid as varchar(4000)) HierarchySort --reverse of above, read left to right name1.name2.name3 for sorting on reporting only , cast(Root.Name as varchar(4000)) HierarchyLabel --use for labels on reporting only, indents names under sorted hierarchy , Root.CustomerCount CustomerCount FROM tempdb.dbo.account Root WHERE Root.ParentID is null UNION ALL SELECT Recurse.Acctid AccountId , Recurse.Name AccountName , Recurse.ParentId ParentId , Root.HierarchyLevel + 1 HierarchyLevel --next level in hierarchy , cast(cast(recurse.Acctid as varchar(40)) +'.'+ Root.IdHierarchy as varchar(4000)) IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting , cast(replace(recurse.Name,'.','') +'.'+ Root.NameHierarchy as varchar(4000)) NameHierarchy --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy , cast(Root.AccountName +'.'+ Recurse.Name as varchar(4000)) HierarchySort , cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel , Recurse.CustomerCount CustomerCount FROM tempdb.dbo.account Recurse INNER JOIN AccountHierarchy Root on Root.AccountId = Recurse.ParentId)SELECT hier.AccountId , Hier.AccountName , hier.ParentId , hier.HierarchyLevel , hier.IdHierarchy , hier.NameHierarchy , hier.HierarchyLabel , parsename(hier.IdHierarchy,1) Acct1Id , parsename(hier.NameHierarchy,1) Acct1Name --This is why we stripped out '.' during recursion , parsename(hier.IdHierarchy,2) Acct2Id , parsename(hier.NameHierarchy,2) Acct2Name , parsename(hier.IdHierarchy,3) Acct3Id , parsename(hier.NameHierarchy,3) Acct3Name , parsename(hier.IdHierarchy,4) Acct4Id , parsename(hier.NameHierarchy,4) Acct4Name , hier.CustomerCount /* fantastic up to this point. Next block of code is what causes problem. Logic of code is "sum of CustomerCount for this location and all branches below in this branch of hierarchy" In live environment, goes from taking 3 seconds to 4 minutes by adding this one calc */ , ( SELECT sum(children.CustomerCount) FROM AccountHierarchy Children WHERE hier.IdHierarchy = right(children.IdHierarchy, (1 /*length of id field*/ * hier.HierarchyLevel) + hier.HierarchyLevel - 1 /*for periods inbetween ids*/) --"where this location's idhierarchy is within child idhierarchy" --previously tried a charindex(hier.IdHierarchy,children.IdHierarchy)>0, but that performed even worse ) TotalCustomerCountFROM AccountHierarchy hierORDER BY hier.HierarchySortdrop table tempdb.dbo.Account
11/20/2013 UPDATE
Some of the suggested solutions got my juices flowing, and I tried a new approach that comes close, but introduces a new/different obstacle. Honestly, I don't know if this warrants a separate post or not, but it's related to the solution of this problem.
What I decided was that what was making the sum(customercount) difficult is the identification of children in the context of a hierarchy that starts at the top and builds down. So I started by creating a hierarchy that builds from the bottom up, using the root defined by "accounts that are not parent to any other account" and doing the recursive join backwards (root.parentacctid = recurse.acctid)
This way I could just add the child customer count to the parent as the recursion happens. Because of how I need reporting, and levels, I am doing this bottom up cte in addition to the top down, then just joining them via account id. This approach turns out to be much faster than the original outer query customercount, but I ran into a few obstacles.
First, I was inadvertently capturing duplicative customer count for accounts that are parent to multiple children. I was double or triple counting customer count for some acctid's, by the number of children there were. My solution was to create yet another cte which counts how many nodes an acct has, and divide the acct.customercount during recursion, so when I add up the whole branch the acct is not being double counted.
So at this point, the results of this new version are not correct, but I know why. The bottomup cte is creating duplicates. When the recursion passes, it looks for anything in the root (bottom level children) that is child to an account in the account table. On the third recursion, it picks up the same accounts it did in the second and puts them in again.
Ideas on how to do a bottom up cte, or does this get any other ideas flowing?
Use TempdbgoCREATE TABLE dbo.Account( Acctid varchar(1) NOT NULL , Name varchar(30) NULL , ParentId varchar(1) NULL , CustomerCount int NULL);INSERT AccountSELECT 'A','Best Bet',NULL,1 UNION ALLSELECT 'B','eStore','A',2 UNION ALLSELECT 'C','Big Bens','B',3 UNION ALLSELECT 'D','Mr. Jimbo','B',4 UNION ALLSELECT 'E','Dr. John','C',5 UNION ALLSELECT 'F','Brick','A',6 UNION ALLSELECT 'G','Mortar','C',7 ;With AccountHierarchy AS( --Root values have no parent SELECT Root.AcctId AccountId , Root.Name AccountName , Root.ParentId ParentId , 1 HierarchyLevel , cast(Root.Acctid as varchar(4000)) IdHierarchy --highest parent reads right to left as in id3.Acctid2.Acctid1 , cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step) , cast(Root.Acctid as varchar(4000)) HierarchySort --reverse of above, read left to right name1.name2.name3 for sorting on reporting only , cast(Root.Acctid as varchar(4000)) HierarchyMatch , cast(Root.Name as varchar(4000)) HierarchyLabel --use for labels on reporting only, indents names under sorted hierarchy , Root.CustomerCount CustomerCount FROM tempdb.dbo.account Root WHERE Root.ParentID is null UNION ALL SELECT Recurse.Acctid AccountId , Recurse.Name AccountName , Recurse.ParentId ParentId , Root.HierarchyLevel + 1 HierarchyLevel --next level in hierarchy , cast(cast(recurse.Acctid as varchar(40)) +'.'+ Root.IdHierarchy as varchar(4000)) IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting , cast(replace(recurse.Name,'.','') +'.'+ Root.NameHierarchy as varchar(4000)) NameHierarchy --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy , cast(Root.AccountName +'.'+ Recurse.Name as varchar(4000)) HierarchySort , CAST(CAST(Root.HierarchyMatch as varchar(40)) +'.'+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) HierarchyMatch , cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel , Recurse.CustomerCount CustomerCount FROM tempdb.dbo.account Recurse INNER JOIN AccountHierarchy Root on Root.AccountId = Recurse.ParentId), Nodes as( --counts how many branches are below for any account that is parent to another select node.ParentId Acctid , cast(count(1) as float) Nodes from AccountHierarchy node group by ParentId), BottomUp as( --creates the hierarchy starting at accounts that are not parent to any other select Root.Acctid , root.ParentId , cast(isnull(root.customercount,0) as float) CustomerCount from tempdb.dbo.Account Root where not exists ( select 1 from tempdb.dbo.Account OtherAccts where root.Acctid = OtherAccts.ParentId) union all select Recurse.Acctid , Recurse.ParentId , root.CustomerCount + cast ((isnull(recurse.customercount,0) / nodes.nodes) as float) CustomerCount -- divide the recurse customercount by number of nodes to prevent duplicate customer count on accts that are parent to multiple children, see customercount cte next from tempdb.dbo.Account Recurse inner join BottomUp Root on root.ParentId = recurse.acctid inner join Nodes on nodes.Acctid = recurse.Acctid), CustomerCount as( select sum(CustomerCount) TotalCustomerCount , hier.acctid from BottomUp hier group by hier.Acctid)SELECT hier.AccountId , Hier.AccountName , hier.ParentId , hier.HierarchyLevel , hier.IdHierarchy , hier.NameHierarchy , hier.HierarchyLabel , hier.hierarchymatch , parsename(hier.IdHierarchy,1) Acct1Id , parsename(hier.NameHierarchy,1) Acct1Name --This is why we stripped out '.' during recursion , parsename(hier.IdHierarchy,2) Acct2Id , parsename(hier.NameHierarchy,2) Acct2Name , parsename(hier.IdHierarchy,3) Acct3Id , parsename(hier.NameHierarchy,3) Acct3Name , parsename(hier.IdHierarchy,4) Acct4Id , parsename(hier.NameHierarchy,4) Acct4Name , hier.CustomerCount , customercount.TotalCustomerCountFROM AccountHierarchy hier inner join CustomerCount on customercount.acctid = hier.accountidORDER BY hier.HierarchySort drop table tempdb.dbo.Account