postgresql – Several Postgres partitioning questions (hierarchical partitioning, HASH, PK order)

I’m pretty new to Postgres and to RDBMS partitioning. I have a few related questions about the performance benefits of partitioning.

Background: I am trying to fix slow batch queries on a 4 column table with over 300 million rows. The PK is on all columns and the 3 columns that aren’t at the beginning of the PK also have single column indexes.

The indexes are growing out of control and are collectively larger than all RAM, and 3x the size of the underlying data. Batches regularly read and write a couple hundred million rows and they are slow.

What I have already done/understand: I’ve analyzed the code and determined that only the first and last columns are used in SELECT WHERE clauses. Other than INSERTs, no other queries hit this table (besides a DELETE that will be replaced with partition dropping). So I already know I can drop two of the three indexes since they are unused, which will lead to reduced index size, fewer indexes, and hopefully improved INSERT/SELECT performance.

In addition to dropping the unused indexes, I am going to use partitioning to replace a lengthy DELETE statement (which has a WHERE on the two middle columns) with, instead, LIST partitioning on those columns so I can DROP partitions.

Where I have questions: My proposed LIST partition also breaks up the table somewhat, so the partitions are no more than 1/4 the size of the original table. However, even that size is very large. I am considering adding a second hierarchical partition layer with a HASH partition on the 4th column, which could further considerably reduce partition size. My hope is that this would further reduce RAM consumption and/or improve performance in other ways.

However, I have a few questions about this:

  1. While I would be reducing partition size, my queries actually select on nearly every value in the column where I would be using HASH partitioning. I don’t know the distribution of those selected values, so there might be hot and cold partitions, but few totally cold partitions. In this case, would HASH partitioning even help, or would it be no improvement over the single level of partitioning? It’s kind of confusing to me, because since hashing is random by design, I would expect this to be a problem whenever HASH is used.

The only reason I think it might help is by reducing the height of the index trees, since the indexes exist within partitions. But all indexes would still be used. Basically what I’m wondering is, is it still an improvement to reduce index height even if all the indexes may be used, versus one huge index? Is there a best practice as to when HASH might help?

  1. The upper level of the partition hierarchy is a LIST on the two middle columns because that’s how my DELETE (that I am replacing) is defined. But those columns are never used in WHERE clauses. Is it a problem if the top partition hierarchy level isn’t even used in WHERE clauses? I could reverse the order so the HASH is the parent partition, since I actually select by that, but then I’d need to drop multiple child partitions when I DROP instead of one parent partition. Should I just bite the bullet and switch the order?

  2. Is it best for the order of the partition hierarchy to match the order of the PK? In other words, if my parent partitioning is LIST(col2, col3) and my child partitioning is HASH(col4), should I change my PK from 1,2,3,4 to 2,3,4,1 to match, or does it not matter?