How can I efficiently traverse graph data with this pattern?

I have some relations embodying a directed acyclic graph that includes patterns similar to the following:

I'm looking for an efficient way to traverse this graph data. Here is an example of the seemingly simple task of counting descendants of node 0:

db<>fiddle

DROP TABLE IF EXISTS #edges;CREATE TABLE #edges(tail int, head int);INSERT INTO #edges(tail,head) VALUES    (0,1), (5, 6),  (10,11), (15,16),    (0,2), (5, 7),  (10,12), (15,17),    (1,2), (6, 7),  (11,12), (16,17),    (1,3), (7, 8),  (11,13), (17,18),    (2,3), (7, 9),  (12,13), (17,19),    (2,4), (8, 9),  (12,14), (18,19),    (3,4), (8,10),  (13,14),    (3,5), (9,10),  (13,15),    (4,5), (9,11),  (14,15),    (4,6),          (14,16);WITH descendents(node)AS(    SELECT 0 as node    UNION ALL    SELECT head as node FROM descendents as prior    JOIN #edges ON prior.node = tail)SELECT    (SELECT COUNT(node) FROM descendents)             as total_nodes,    (SELECT COUNT(node) FROM        (SELECT DISTINCT node FROM descendents) as d) as distinct_nodes

This results in the following:

total_nodes | distinct_nodes      10512 |             20

Every path is visited instead of each node once

total_nodes seems to grow at about 2^n where n is the number of nodes in the example. This is because every possible path is traversed rather than each node once. This n=29 example results in 1,305,729 total_nodes and took 75 seconds to complete on my local instance of SQL Server Express.

The obvious strategy is to only visit previously unvisited nodes on each iteration.

Excluding redundant additions using `WHERE...NOT IN` does not seem to be supported

The most direct method of preventing visits to previously visited nodes would seem to be to filter right in the recursive member of the CTE as follows:

SELECT head as node FROM descendents as priorJOIN #edges ON prior.node = tailWHERE head NOT IN (SELECT node from descendents)

This produces the error "Recursive member of a common table expression 'descendents' has multiple recursive references." I suspect this is an example of this restriction from the documenation:

The FROM clause of a recursive member must refer only one time to the CTE expression_name.

No success avoiding previously visited nodes in the CTE's recursive member

I've tried a few different techniques for avoiding previously visited nodes as recursion progresses in the CTE. I haven't found a technique that works with this graph pattern, however. The limitation seems to boil down to the following statement from the documentation:

...aggregate functions in the recursive part of the CTE are applied to the set for the current recursion level and not to the set for the CTE. Functions like ROW_NUMBER operate only on the subset of data passed to them by the current recursion level and not the entire set of data passed to the recursive part of the CTE.

Previously visited nodes appear at different recursion levels. The above limitation seems to rule out the deduplication across recursion levels that would be needed to make this query efficient.

Query Analysis

The following is from a run of a query run on 42 nodes invoked on my local instance of SQL Server Express. (Note that half the nodes are commented out in the fiddle because running with all 42 nodes causes the fiddle to timeout.)

The query was run as follows:

set statistics xml on;SET STATISTICS IO ON;SET STATISTICS TIME ON;WITH descendents(node)AS(    SELECT 0 as node    UNION ALL    SELECT head as node FROM descendents as prior    JOIN #edges ON prior.node = tail)SELECT DISTINCT    nodeFROM descendentsSET STATISTICS TIME OFF;SET STATISTICS IO OFF;set statistics xml off;

This resulted in the following messages:

Started executing query at Line 1(76 rows affected)
SQL Server parse and compile time:CPU time = 0 ms, elapsed time = 14 ms.(42 rows affected)
Table 'Worktable'. Scan count 2, logical reads 943777243, physical reads 0, read-ahead reads 12117, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#edges______________________________________________________________________________________________________________000000002516'. Scan count 1, logical reads 157296207, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row affected)
SQL Server Execution Times:CPU time = 2972344 ms, elapsed time = 3319271 ms.Total execution time: 00:55:19.424

Here is the execution plan:

And these are screenshots of the tooltips for what would seem to be the most time consuming steps:

Possible alternatives

I can think of a few programmatic approaches that achieve node traversal without relying on CTE recursion. I'm reluctant to prematurely abandon CTEs given the abundance of advice to prefer set logic where possible. I expect that there are some built-in SQL Server features that would support some improved approaches. The other approaches I've contemplated are as follows:

stored procedure that implements recursion programmatically
hierarchyid
SQL Graph

I have limited or no experience with each of these approaches and so am interested in advice about the tradeoffs of these approaches for the above example.

Questions

Is there a more efficient way to traverse the graph in the above example using recursive CTEs?
Of the above alternatives, what are the tradeoffs?
Are there any other alternatives I should consider?

How can I efficiently traverse graph data with this pattern?

Every path is visited instead of each node once

Excluding redundant additions using `WHERE...NOT IN` does not seem to be supported

No success avoiding previously visited nodes in the CTE's recursive member

Query Analysis

Possible alternatives

Questions

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112

Every path is visited instead of each node once

Excluding redundant additions using WHERE...NOT IN does not seem to be supported

No success avoiding previously visited nodes in the CTE's recursive member

Query Analysis

Possible alternatives

Questions

Trending Articles

Excluding redundant additions using `WHERE...NOT IN` does not seem to be supported