I'm working on Posrgres 9.6 with PostGIS 2.3 hosted on AWS RDS. I am trying to optimize some geo-radius queries for data that comes from different tables.
I'm thinking of two approaches: a single query with multiple joins, or two separate but simpler queries.
At a high level and to simplify the structure, my scheme is:
CREATE EXTENSION "uuid-ossp"; CREATE EXTENSION IF NOT POSTAGIS exists; CREATE TABLE addresses ( id bigint NOT NULL, Latitude double precision, Longitude double precision, Line1 character that varies NOT NULL, Geography "Position" (item 4326), CONSTRAINT enforce_srid CHECK ((st_srid ("position") = 4326)) ); CREATE INDEX index_addresses_on_position ON addresses USING gist ("position"); CREATE TABLE locations ( id bigint NOT NULL, uuid uuid DEFAULT uuid_generate_v4 () NOT NULL, address_id bigint NOT NULL ); CREATE TABLE stores ( id bigint NOT NULL, Name character varies NOT NULL, location_id bigint NOT NULL ); CREATE TABLE inventories ( id bigint NOT NULL, shop_id bigint NOT NULL, Status character varies NOT NULL );
addresses Table contains the geographic data. The
position The column is calculated from the Lat-Lng columns as the rows are inserted or updated.
address is connected to one
address can have many
shops, and each one
business will have one
I've omitted them for the sake of brevity, but all tables have the correct foreign key constraints and Btree indexes for the reference columns.
The tables have hundreds of thousands of lines.
This allows my main use case to be satisfied by this single query being searched for
addresses 1000 meters from a central geographical point (
10.0, 10.0) and returns data from all tables:
CHOOSE s.id AS shop_id, s.name AS business name, i.status AS inventory_status, l.uuid AS location_uuid, a.line1 AS addr_line, Latitude AS lat, a.longitude AS lng From addresses a JOIN locations l ON l.address_id = a.id JOIN Shops s ON s.location_id = l.id JOIN Inventories i ON i.shop_id = s.id WO ST_DIf ( a.position, - the position of each address ST_SetSRID (ST_Point (10.0, 10.0), 4326), - the center of the circle 1000, - Radius distance in meters true );
This query works and
EXPLANATORY ANALYSIS indicates that it is being used correctly
However, I could split this query in half and manage the intermediate results at the application level. This also works for example:
--- Search only for the addresses CHOOSE a.id as addr_id, a.line1 AS addr_line, Latitude AS lat, a.longitude AS lng From addresses a WO ST_DIf ( a.position, - the position of each address ST_SetSRID (ST_Point (10.0, 10.0), 4326), - the center of the circle 1000, - Radius distance in meters true ); --- Get the rest of the data CHOOSE s.id AS shop_id, s.name AS business name, i.status AS inventory_status, l.id AS location_id, l.uuid AS location_uuid FROM the locations l JOIN Shops s ON s.location_id = l.id JOIN Inventories i ON i.shop_id = s.id FROM WHERE l.address_id IN (1, 2, 3, 4, 5) - possibly thousands of values ;
where the values in
l.address_id IN (1, 2, 3, 4, 5) come from the first query.
The query plans for the two separate queries look simpler than the first, but I wonder if that in itself means the second solution is better.
I know that inner joins are pretty well optimized and that a single roundtrip to the DB is preferable.
What about memory usage? Or resource conflicts on the tables? (eg locks)