I'm having a significant performance difference between two PostgreSQL queries that I'm trying to understand and optimize. I have a table with around 2TB of data, and both queries use the sequential scan. The first query executes quickly, while the second one takes more than 2 hours to complete. Here are the queries and some details:
1st Query
WITH cte_mytable AS ( SELECT *, coalesce(array_length(tags, 1), 0) as size FROM my_table limit 100)SELECT user_id, ...some other columnsFROM cte_mytableWHERE cte_mytable.user_id=user_idORDER BY size desc;
Explain log:
|QUERY PLAN ||---------------------------------------------------------------------------------------------------||Sort (cost=13.02..13.27 rows=100 width=593) || Sort Key: cte_my_table.size DESC || -> Subquery Scan on cte_my_table (cost=0.00..9.70 rows=100 width=593) || Filter: ((cte_my_table.user_id)::text IS NOT NULL) || -> Limit (cost=0.00..8.70 rows=100 width=593) || -> Seq Scan on my_table (cost=0.00..197608426.93 rows=2272637194 width=593)|
2nd Query
SELECT *FROM my_tableORDER BY array_length(tags, 1) descLIMIT 100;
Explain log
|QUERY PLAN ||-----------------------------------------------------------------------------------------------------------||Limit (cost=217229180.49..217229192.16 rows=100 width=593) || -> Gather Merge (cost=217229180.49..438195445.87 rows=1893864328 width=593) || Workers Planned: 2 || -> Sort (cost=217228180.47..219595510.88 rows=946932164 width=593) || Sort Key: (array_length(tags, 1)) DESC || -> Parallel Seq Scan on my_table (cost=0.00..181037114.05 rows=946932164 width=593)|
Version
PostgreSQL 14.6
Can someone help me understand why the second query is so slow and the first query is faster?