I have a table of hierarchical data where I want to select entries that, 1) only include matches that are descendants of a given parent, and 2) returns the full hierarchical path of the matched entries.
I am doing this with two CTEs: one that gets all recursive children from the specified parent, and another that selects matches and gets all the recursive parents of those matches. The main query then joins and groups these two CTEs.
However, the performance is not good. Running either CTE on its own (i.e., only getting enties that are descendants, or only getting path hierarchies) returns the results immediately, but both CTEs together (running on a table with millions of entries) can take minutes to hours, depending on the number of string matches, even a single match. It also won't return the first found entry until all are found.
Is it possible to improve the performance of the below query, or at the very least, make it so it starts returning found results immediately?
The table has this structure:
CREATE TABLE [Files]( [fileId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, [name] TEXT NOT NULL, [parentFileId] INTEGER NULL, UNIQUE ( [name] , [parentFileId] ));CREATE INDEX [Files_parentFileId] ON [Files]([parentFileId]);CREATE INDEX [Files_name] ON [Files]([name]);
And the fastest query I've been able to produce is:
-- The [@Context] CTE lists all the fileIds that are descendants-- of the input @rootFileId. The query should only return-- matches that are in this list of fileIdsWITH [@Context]AS ( -- Starting with the ID of the root entry.. SELECT @rootFileId [fileId] UNION -- ..build up all the IDs of the children SELECT [Files].[fileId] FROM [Files] JOIN [@Context] ON [Files].[parentFileId] = [@Context].[fileId] )-- The [@FilePath] CTE selects each entry that matches the given-- input @name and all entries that are its parents, [@FilePath]AS ( -- Starting with the entries that match the given @name.. SELECT -1 [depth] , [Files].[fileId] , [Files].[parentFileId] , [Files].[name] FROM [Files] WHERE [name] LIKE @name UNION -- ..build up all the IDs of the parents SELECT [@FilePath].[depth] - 1 , [@FilePath].[fileId] -- for grouping entries by the matched entry's id , [Files].[parentFileId] , [Files].[name] FROM [Files] JOIN [@FilePath] ON [Files].[fileId] = [@FilePath].[parentFileId] )-- Group all the parent entries into one column with each found entry-- and exclude any entry that wasn't on the [@Context] listSELECT [@FilePath].[fileId], [@FilePath].[name], GROUP_CONCAT([@FilePath].[parentFileId]) [pathFileId], GROUP_CONCAT([@FilePath].[name], '\') [pathname]FROM [@FilePath]JOIN [@Context] ON [@Context].[fileId] = [@FilePath].[fileId]GROUP BY [@FilePath].[fileId]ORDER BY [@FilePath].[depth];