Quantcast
Viewing all articles
Browse latest Browse all 207

Improving query performance with two inner joins on tables that have > 40M rows

ContextI have three tables - nodes, edges and sections. This is a graph network of paths from open street maps where edges are created from two neighbouring 3D nodes (that give distance, elevation gain etc.) and sections are a collection of edges between decision nodes. In effect sections build on edges which build on nodes.This is a Postgres 12.5 database running on AWS RDS service on a m5 Large instance.In terms of size:Nodes - 40M rowsEdges - 80M rowsSections - 15M rowsThe tables have the following indexes:

Nodes

  • nodeid

Edges

  • edgeid
  • edgestartnode
  • edgeendnode

Sections

  • sectioned
  • startnode

AimI have a routing algorithm that returns an ordered list of section ids that I then want to use in a query (below) to pull out the relevant information from the database (from the nodes, edges, and sections tables) so that I can plot it on a map and display relevant information to the user. The list of sections can contain thousands of values, with the query performance information related below to a list of length 8646.

db fiddle for query below here

SELECT row_to_json(results)FROM (SELECT id, edgeid, lat, lon, elevation AS elev, edgelength As dist, edgegradient AS gradFROM (SELECT ord, sectionid as id, unnest(edgelist) as edges FROM unnest(ARRAY[IDs])WITH ordinality AS arg_list(id, ord)INNER JOIN sections s on s.sectionid = idORDER BY ord) as ed_listINNER JOIN edges e on e.edgeid = ed_list.edges INNER JOIN nodes n on n.nodeid = e.edgeendnode) results;    

IssueIf the database is 'cold' and hasn't been used in a geographical region recently, when this query runs it takes a very long time, with the following query plan here.

However, if the query or a similar query is run in a short time after it, the performance is much better, as shown in this plan, with a 100X improvement.

QuestionI've done a lot of reading about trying to improve joins and tried implementing different techniques, like here and tried the following query (db fiddle here)

WITH sections_data(ord, id, edges) AS (SELECT ord, sectionid as id, unnest(edgelist) as edgesFROM sectionsJOIN   unnest(ARRAY[IDs]) WITH ORDINALITY t(sectionid, ord) USING (sectionid)),edge_data(ord, id, edgeid, end_node, dist, grad) AS (SELECT ord, id, edgeid, edgeendnode as end_node, edgelength As dist, edgegradient AS grad FROM edges , sections_data WHERE edgeid = edges),node_data(ord, id, edgeid, lat, lon, elev, dist,  grad) AS (SELECT ord, id, edgeid, lat, lon, elevation as elev, dist, grad FROM nodes, edge_data WHERE nodeid = end_node)SELECT row_to_json(results) FROM (SELECT id, edgeid, lat, lon, elev, dist, grad FROM node_data ORDER BY ord) results;

but when I tried implementing that it resulted in a slightly slower query for me. So is there a better way to do it than I have currently?


Viewing all articles
Browse latest Browse all 207

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>