postgresql – Optimize query matching first N items of an array

You need index support to be fast. Tricky without re-designing your table. The following solution should perform excellently (microseconds instead of seconds), but requires some skill. Buckle up.

Expression index on IMMUTABLE function

Just take a few leading array elements, say 8. That should be very selective already. More would just make the index bigger, for little extra filtering.

Convert to a string. No separator. That allows false positives, but unlikely enough to matter. Eventually, we filter for exact results anyway.

Only IMMUTABLE functions are allowed in index expressions. But array_to_string() is only STABLE, not IMMUTABLE, because it takes anyarray and some element types don’t have an immutable text representation. We only deal with text (well, varchar(255) for no good reason, but all the same), and that is, in fact, immutable. But array_to_string() doesn’t know that.

So we could “fake” an immutable wrapper function. That’s possible for any user who can create functions:

CREATE OR REPLACE FUNCTION public.f_8moves(text())
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT array_to_string($1(1:8), '')
$func$;

Or, better yet, define an honestly IMMUTABLE variant of array_to_string() for only text() input, directly based on the underlying C function. Faster and cleaner. Let’s call it array_to_string_immutable() to be clear. That requires superuser privileges:

-- SET ROLE postgres;  -- you must be superuser

CREATE OR REPLACE FUNCTION public.array_to_string_immutable(text(), text)
 RETURNS text
 LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'array_to_text';

The rest works without superuser privileges.

CREATE OR REPLACE FUNCTION public.f_8moves(text())
  RETURNS text
  LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.array_to_string_immutable($1(1:8), '')
$func$;

Related:

Either way, we now have a function public.f_8moves(text()) to be used in the following index:

CREATE INDEX records_8moves_idx ON records (public.f_8moves(moves) COLLATE "C");

COLLATE "C" is exactly what we need to allow left-anchored (your expressed requirement) LIKE expressions. See:

If a major percentage of rows has evaluation IS NOT NULL, add that filter to the index, to make it a partial index on top:

CREATE INDEX records_8moves_idx ON records (public.f_8moves(moves) COLLATE "C")
WHERE evaluation IS NOT NULL;

Query

SELECT evaluation(10)
FROM   records
WHERE  public.f_8moves(moves) = public.f_8moves('{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}') COLLATE "C"
AND    moves(1:10) = '{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}'
AND    evaluation IS NOT NULL;

COLLATE "C" is required to match the index.

Use the expression public.f_8moves(moves) to match the functional index. The right side of the expression can be given as string directly. But use the same function for convenience.

Then add original exact filters to narrow the results down to exact matches:

AND    moves(1:10) = '{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}'

Looks redundant, is logically redundant, but allows to use the index to great effect.

For arrays with less than 8 elements (our indexed lead), or generally, use LIKE with a left-anchored pattern:

SELECT evaluation(10)
FROM   records
WHERE  public.f_8moves(moves) LIKE (public.f_8moves('{b4,e5}') || '%') COLLATE "C"  -- COLLATE "C" is optional for LIKE
AND    moves(1:2) = '{b4,e5}'
AND    evaluation IS NOT NULL;

db<>fiddle here

Similar, with more explanation:

Cluster table rows

It will generally help performance to physically sort your table rows, so that each query can read results from one or few data pages, instead of fetching from all over the place.

Use CLUSTER of the non-blocking alternatives pg_repack or pg_squeeze More in the above link.