postgresql – Drop intermediate results of recursive query in Postgres

I’m trying to aggregate the contents of a column from a directed acyclic graph (DAG) stored in a Postgres table.

Each dag row has an id, bytes, and may also have a parent referencing another row. I’m trying to write a function which identifies every “tip” in the graph starting from a particular parent id. The result needs to include each graph tip and its aggregated collected_bytes from the parent id to that tip. (The DAG can be very deep, so some collected_bytes arrays can have millions of elements.)

The function below works, but memory usage grows quadratically as the collected_bytes get longer. The results CTE keeps a copy of every iteration of collected_bytes until the end of the query, then the ranked CTE is used to select only the deepest node for each tip.

I think I’m approaching this incorrectly: how can I do this more efficiently?

Is it possible to instruct Postgres to drop the intermediate results as the recursive query is running? (So we can also skip the ranked CTE?) Is there a more natural way to achieve the result below?

DROP TABLE IF EXISTS dag;
CREATE TABLE dag (
    id bigint PRIMARY KEY,
    parent bigint,
    bytes bytea NOT NULL,
    FOREIGN KEY (parent) REFERENCES dag(id)
);
INSERT INTO dag (id, parent, bytes) VALUES  (0, NULL, 'x0000'),
                                            (1, NULL, 'x0100'),
                                            (2, 0, 'x0200'),
                                            (3, 1, 'x0300'),
                                            (4, 2, 'x0400'),
                                            (5, 3, 'x0500'),
                                            (6, 4, 'x0600'),
                                            (7, 5, 'x0700'),
                                            (8, 4, 'x0800');

DROP FUNCTION IF EXISTS get_descendant;
CREATE FUNCTION get_descendant (input_id bigint)
    RETURNS TABLE(start_id bigint, end_id bigint, collected_bytes bytea(), depth bigint)
    LANGUAGE sql STABLE
    AS $$
    WITH RECURSIVE results AS (
            SELECT id AS start_id, id AS end_id, ARRAY(bytes) AS collected_bytes, 0 AS depth
                FROM dag WHERE id = input_id
        UNION ALL
            SELECT start_id,
                   dag.id AS end_id,
                   collected_bytes || dag.bytes AS collected_bytes,
                   depth + 1 AS depth
                FROM results INNER JOIN dag
                    ON results.end_id = dag.parent
                WHERE depth < 100000
    ),
    ranked AS (
        SELECT *, rank() over (PARTITION BY start_id ORDER BY start_id, depth DESC) FROM results
    )
    SELECT start_id, end_id, collected_bytes, depth FROM ranked WHERE rank = 1;
$$;

Here’s the result for 0, which has two valid tips, an id of 6 and 8. The collected_bytes field is the aggregation of bytes along each path:

postgres=# SELECT get_descendant.* FROM get_descendant(0::bigint);
 start_id | end_id |              collected_bytes              | depth 
----------+--------+-------------------------------------------+-------
        0 |      6 | {"\x0000","\x0200","\x0400","\x0600"} |     3
        0 |      8 | {"\x0000","\x0200","\x0400","\x0800"} |     3
(2 rows)

While here are the intermediate results before being ranked (and only the maximum depths selected):

postgres=# SELECT get_descendant.* FROM get_descendant(0);
 start_id | end_id |              collected_bytes              | depth 
----------+--------+-------------------------------------------+-------
        0 |      0 | {"\x0000"}                               |     0
        0 |      2 | {"\x0000","\x0200"}                     |     1
        0 |      4 | {"\x0000","\x0200","\x0400"}           |     2
        0 |      6 | {"\x0000","\x0200","\x0400","\x0600"} |     3
        0 |      8 | {"\x0000","\x0200","\x0400","\x0800"} |     3
(5 rows)

As you can see, this implementation is already wasting ~half of the memory in use. How can I make this more memory efficient?

Thanks!