postgresql – how could my patroni clusters split read and write queries in load balance?

I want to ask to split queries in load balancer level or on patroni servers because we can not fix or add connectionstring to in app.
In patroni, does pg_pool2 not work? Should i use two connectionstring for my app? can select queries not automatically go secondary?

node_id |    hostname     | port | status | lb_weight |  role  | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+-----------------+------+--------+-----------+--------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | 192.168.118.138 | 5432 | up     | 0.400000  | master | 315        | false             | 0                 |                   |                        | 2021-04-18 15:27:05
 1       | 192.168.118.139 | 5432 | up     | 0.600000  | slave  | 0          | true              | 0                 |                   |                        | 2021-04-18 15:27:05
(2 rows)

postgresql – Generic data type that belongs to “something”, where the same data type/shape can belong to multiple tables

In postgres one often has one-to-many tables, where multiple entities in table B refer to one element in table A.

But can this also be used the generalize data shape, where I have a generic shape of data store in table B, but it can either belong to an entity in table A or table C? Is there a clean way to do this without DRY nor with redundant “empty” fields?

Say I have a class “time period” – which would include an id, start_date, end_date. Now I have two tables A and C where each entry in those table both have time periods, for the rest there’s little in common between those tables.

The earlier solution I would make is a time_period_a and a time_period_b table to copy the format of the time period. But this violates DRY, and I’ve experience lots of troubles with this violation (say in one table I forgot to update the dates to now include timezone).

I’ve also thought about adding both columns for a and b, and setting them to NULL if not needed. But this creates a lot of trouble where I have to add either some triggers or complexity to prevent both be NULL or both be not NULL.

What is the canonical solution for this? Is there a way to store -in the field, not in the column- that the field refers to a specific table id? Instead of just a generic number and then get from the logic of the column/foreignkeyconstraint which table it refers to?

How to represent a list of entities within a table of the same entity in PostgreSQL?

There’s a couple of ways you can go about this but the most relational and normalized way would be to create a second table called UserFriendList with the columns UserId and FriendUserId which would store one row per Friend for each User. This table would be one-to-many from User.Id to UserFriendList.UserId but would also be able to help bridge the join back to the User table on UserFriendList.FriendUserId to User.Id to get all the User attributes of the friends. This kind of table is known as a bridge / junction / linking table.

Alternatively you can store the FriendList column directly on the User table as either a comma delimited list or in JSON, but these are both denormalized solutions, which will become harder to maintain changes, potentially lead to data redundancy, and will inflate the size of your User table which could make querying it less efficient.

postgresql – Local Postgres Server on Mac Pw Auth Fail

Disclaimer: I am new to MacOs. Haven’t found any useful related question.

After successfully installing postgresql 13.2 via homebrew on MacOs 11.2.1 (BigSur) i run into the following problem:

Use terminal for the command

psql postgres

Prompts user pwd of my standard Mac user account.

Returns:

psql: error: FATAL:  password authentication failed for user "standarduser"

Try:

sudo psql postgres

Now asks standard user pwd and accepts it.
Then asks root user pwd and rejects it with the same error as before:
Returns:

psql: error: FATAL:  password authentication failed for user "root"

What am I missing?

postgresql – Postgres connection times out on LAN, but not WAN

SETUP

Using
Postgres 11,
pg-promise npm module

I have a local network of about 5 machines all running on a Class C 192.168.1.0/24 configuration.

My postgres is running on 192.168.1.A and is accessible by external connection through NAT and firewalling and my heroku API connects to postgres on 192.168.1.A perfectly.

On 192.168.1.B inside my network I have the dev version of the API running similarly to heroku, on a different machine.

The Problem

192.168.1.B one is persistently giving me a timeout error. The machines are in the same physical location. I’m not sure how to track down why I am getting this timeout error. The setup is very simple.

pg_hba.conf has these lines:

host    all             all             127.0.0.1/32            md5
host    all             all             0.0.0.0/0               md5
host    all             all             192.168.1.0/24          md5

Heroku log says:

2021-04-16T22:46:10.973041+00:00 heroku[router]: at=info method=GET path="/api/locations" host=<host> request_id=<id> fwd="<an ip>" dyno=web.1 connect=1ms service=735ms status=304 bytes=182 protocol=https

Express on the local dev machine says:

GET /api/locations 500 31647.158 ms - 37

I’m stumped as to why this would happen. The WAN connect works, but the LAN connect times out?

postgresql – Postgres view role permissions

I am trying to work out how database roles work.
Here is my user case…

create role user1 login password 'user1';

create schema authorization user1;

CREATE ROLE DEV_ROLE;

grant connect, temporary on database test to DEV_ROLE;
GRANT ALL ON SCHEMA manager TO DEV_ROLE;

Now how do i view the all the grants assigned to ‘DEV_ROLE’? This is possible in Oracle but trying to work out the same here.

Do appreciate your reply.
Ta

postgresql – Very slow query for massive stats calculation

In our app, we need to calculate the price comparison between a listing and its applicable comparables. It needs to happen for all active listings in DB on daily basis. The numbers of listings we are talking about are anywhere between 100k-200k (per day).

The idea is that we calculate two comparables (in city and area) and add these records to a log table to further use.

Initially, we created a query that creates this log record for a single listing and managed everything else in code. That worked great, but it added a lot of complexity and overall was slow. We ended up playing catch-up.

Next step was to create a single query that creates the price logs for all. And it looks like this:

with listings_to_process as (
  select * from listing l  
  where 
    l.status IN ('live', 'updated') 
    and l.deleted_date is null
    and not exists (
      select * from price_log pl where pl.date = CURRENT_DATE and pl.listing_id = l.id
    )
)
insert into price_log (listing_id, date, price, city_average, subdivision_average)
select 
  ltp.id as listing_id, 
  CURRENT_DATE as date, 
  (select list_price from listing where id = ltp.id) as price,
  (
    select avg(list_price) from listing 
    where 
      status IN ('live', 'updated') 
      and deleted_date is null 
      and id != ltp.id 
      and country = ltp.country 
      and city = ltp.city 
      and type = ltp.type 
      and ownership_type = ltp.ownership_type 
      and (ltp.bedrooms is null or ltp.bedrooms = 0 or bedrooms = ltp.bedrooms)
      and (ltp.living_area is null or ltp.living_area = 0 or living_area <@ int4range((ltp.living_area - 150), (ltp.living_area + 150)))
      and (ltp.year_built is null or ltp.year_built = 0 or year_built <@ int4range((ltp.year_built - 5), (ltp.year_built + 5)))
  ) as city_average,
  (
    select avg(list_price) from listing 
    where 
      status IN ('live', 'updated') 
      and deleted_date is null 
      and id != ltp.id 
      and country = ltp.country 
      and city = ltp.city 
      and subdivision = ltp.subdivision
      and type = ltp.type 
      and ownership_type = ltp.ownership_type 
      and (ltp.bedrooms is null or ltp.bedrooms = 0 or bedrooms = ltp.bedrooms)
      and (ltp.living_area is null or ltp.living_area = 0 or living_area <@ int4range((ltp.living_area - 150), (ltp.living_area + 150)))
      and (ltp.year_built is null or ltp.year_built = 0 or year_built <@ int4range((ltp.year_built - 5), (ltp.year_built + 5)))
  ) as subdivision_average
from listings_to_process ltp
on conflict (listing_id,date)
do nothing;

Logically it works and quite fast for small datasets. On full DB it runs forever, and I can’t figure out how to improve it any further.

Here is the explain for that:

Insert on price_log  (cost=1295.97..609417575.98 rows=50654 width=36)
  Conflict Resolution: NOTHING
  Conflict Arbiter Indexes: price_log_listing_id_date_idx
  ->  Hash Anti Join  (cost=1295.97..609417575.98 rows=50654 width=36)
        Hash Cond: (l.id = pl.listing_id)
        ->  Index Only Scan using lising_price_stats_idx on listing l  (cost=0.42..10192.64 rows=64604 width=67)
              Index Cond: (status = ANY ('{live,updated}'::text()))
        ->  Hash  (cost=748.39..748.39 rows=33293 width=4)
              ->  Seq Scan on price_log pl  (cost=0.00..748.39 rows=33293 width=4)
                    Filter: (date = CURRENT_DATE)
        SubPlan 1
          ->  Bitmap Heap Scan on listing  (cost=1.43..2.44 rows=1 width=8)
                Recheck Cond: (id = l.id)
                ->  Bitmap Index Scan on listing_pkey  (cost=0.00..1.43 rows=1 width=0)
                      Index Cond: (id = l.id)
        SubPlan 2
          ->  Aggregate  (cost=6007.42..6007.43 rows=1 width=8)
                ->  Bitmap Heap Scan on listing listing_1  (cost=427.62..6007.42 rows=1 width=8)
                      Recheck Cond: (((type)::text = (l.type)::text) AND (deleted_date IS NULL))
                      Filter: (((status)::text = ANY ('{live,updated}'::text())) AND (id <> l.id) AND (country = l.country) AND ((city)::text = (l.city)::text) AND ((ownership_type)::text = (l.ownership_type)::text) AND ((l.bedrooms IS NULL) OR (l.bedrooms = 0) OR (bedrooms = l.bedrooms)) AND ((l.living_area IS NULL) OR (l.living_area = 0) OR (living_area <@ int4range((l.living_area - 150), (l.living_area + 150)))) AND ((l.year_built IS NULL) OR (l.year_built = 0) OR (year_built <@ int4range((l.year_built - 5), (l.year_built + 5)))))
                      ->  Bitmap Index Scan on listing_type_idx  (cost=0.00..427.62 rows=5360 width=0)
                            Index Cond: ((type)::text = (l.type)::text)
        SubPlan 3
          ->  Aggregate  (cost=6020.82..6020.83 rows=1 width=8)
                ->  Bitmap Heap Scan on listing listing_2  (cost=427.62..6020.82 rows=1 width=8)
                      Recheck Cond: (((type)::text = (l.type)::text) AND (deleted_date IS NULL))
                      Filter: (((status)::text = ANY ('{live,updated}'::text())) AND (id <> l.id) AND (country = l.country) AND ((city)::text = (l.city)::text) AND (subdivision = l.subdivision) AND ((ownership_type)::text = (l.ownership_type)::text) AND ((l.bedrooms IS NULL) OR (l.bedrooms = 0) OR (bedrooms = l.bedrooms)) AND ((l.living_area IS NULL) OR (l.living_area = 0) OR (living_area <@ int4range((l.living_area - 150), (l.living_area + 150)))) AND ((l.year_built IS NULL) OR (l.year_built = 0) OR (year_built <@ int4range((l.year_built - 5), (l.year_built + 5)))))
                      ->  Bitmap Index Scan on listing_type_idx  (cost=0.00..427.62 rows=5360 width=0)
                            Index Cond: ((type)::text = (l.type)::text)
JIT:
  Functions: 49
  Options: Inlining true, Optimization true, Expressions true, Deforming true

As you can see the cost is gigantic and I can get rid of Hash Anti Join.
Is there any way to make it more efficient?

postgresql – Same query taking different execution times on equivalent RDS instances

The train table has around 60million records all of which have their updated_at timestamp in the year 2021. The schema of the table is as below.

                         Table "public.train"
     Column     |            Type             | Collation | Nullable | Default 
----------------+-----------------------------+-----------+----------+---------
 id             | uuid                        |           | not null | 
 updated_at     | timestamp without time zone |           | not null | 
 status         | task_status                 |           | not null | 
 status_details | json                        |           |          | 
Indexes:
    "train_id_updated_at_key" UNIQUE CONSTRAINT, btree (id, updated_at)

I have created a temp table train_temp with the same schema but partitioned on updated_at with Partition key: RANGE (date(updated_at))

I ran the below query to add the train table as a partition to train_temp in an RDS instance that is connected to the application server.

  ALTER TABLE train_temp ATTACH partition train FOR VALUES FROM ('2021-01-01') to ('2022-01-01');

The query takes around 5 minute.

Before running the query I had taken a snapshot of the RDS instance. I restored the snapshot on a couple of instances with the same configurations as the original RDS instance.

But when I run the query in the new instances the query takes around 1hr which is very bizarre since it is the same data in an equivalent instance.

Because of the uncertainty in the execution time, I am not able to come up with a concrete plan on how I could take this to production.

Any help would be much appreciated.

postgresql – An update operation that would target 0 rows, aborted by timeout

I have an update operation that is being executed on a quite large table (postgresql@10):

UPDATE contacts_trashbin 
SET op_id='x60771801dbed2d6e021ec067'::bytea 
WHERE contacts_trashbin.company_id = 'x577f0cd198d4e67e170d31e1'::bytea 
  AND contacts_trashbin.deleted <= '2021-02-13T16:27:45.599860'::timestamp 
  AND (contacts_trashbin.op_id = 'x000000000000000000000000'::bytea 
       OR contacts_trashbin.removing IS true)

This request timeouts after 60 seconds (the hard limit we have on our server for all operations).
What’s odd about this timeout is that the select with the same filter conditions:

select count(*) from contacts_trashbin
WHERE contacts_trashbin.company_id = 'x577f0cd198d4e67e170d31e1'::bytea
  AND contacts_trashbin.deleted <= '2021-02-13T16:27:45.599860'::timestamp
  AND (contacts_trashbin.op_id = 'x000000000000000000000000'::bytea OR contacts_trashbin.removing IS true)

takes only 71ms and returns 0.

There are several indexes on this table, one of them is

    create index ix_company_id_deleted on contacts_trashbin (company_id, deleted);

The total number of rows WHERE company_id='x577f0cd198d4e67e170d31e1' is ~600k if it matters.

Any ideas why it could be happening?

postgresql – How do I add a where statement at the end of query for a column that does not have a table to reference from?

Use the WITH clause to get the result set and SELECT only the wanted rows.

WITH CTE As (
select
CASE WHEN (to_char(((case when v.trip_order_start_date is null then case when v.manual_start_date is null then t.required_pickup_date else v.trip_order_start_date end else v.trip_order_start_date end  + (interval  '-1 hours' * ofc.offset)) at time zone 'utc' at time zone 'America/Chicago' + (interval  '-1 hours' * ofc.offset)) at time zone 'utc' at time zone 'America/Chicago', 'YYYYMM') < to_char((now() at time zone 'utc' at time zone 'America/Chicago'), 'YYYYMM')
        AND(coalesce(posted_date, paid_date) IS NULL
            OR to_char((coalesce(v.posted_date, v.paid_date) + (interval  '-1 hours' * ofc.offset)) at time zone 'utc' at time zone 'America/Chicago', 'YYYYMM') > to_char((now() at time zone 'utc' at time zone 'America/Chicago' - interval '1' month), 'YYYYMM'))) THEN 'YES' ELSE 'NO' END AS accrual
) 
SELECT * FROM CTE WHERE accrual = 'YES'

Or use the SELECT query as SUBSECT in a FROM clause and use that

SELECT * FROM (
    select
    CASE WHEN (to_char(((case when v.trip_order_start_date is null then case when v.manual_start_date is null then t.required_pickup_date else v.trip_order_start_date end else v.trip_order_start_date end  + (interval  '-1 hours' * ofc.offset)) at time zone 'utc' at time zone 'America/Chicago' + (interval  '-1 hours' * ofc.offset)) at time zone 'utc' at time zone 'America/Chicago', 'YYYYMM') < to_char((now() at time zone 'utc' at time zone 'America/Chicago'), 'YYYYMM')
            AND(coalesce(posted_date, paid_date) IS NULL
                OR to_char((coalesce(v.posted_date, v.paid_date) + (interval  '-1 hours' * ofc.offset)) at time zone 'utc' at time zone 'America/Chicago', 'YYYYMM') > to_char((now() at time zone 'utc' at time zone 'America/Chicago' - interval '1' month), 'YYYYMM'))) THEN 'YES' ELSE 'NO' END AS accrual
    ) t1 
WHERE accrual = 'YES'

In both cases you need a valid SELECT