postgresql 13 is slower than postgres 11

Preambule : After reading some stackexchange posts, I set jit to off and get an improvement, but still have an issue :

I set up two db on the same machine with the same content (pg_dump | psql).

I run a query that join two tables and apply some filters :

explain analyse SELECT count(e0."begins_on") FROM "events" AS e0
INNER JOIN "addresses" AS a1 ON a1."id" = e0."physical_address_id"
WHERE ST_DWithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, a1."geom"::geography, '25000');

On PG 11:

Q
UERY PLAN                                                                                                                                                                                    
         
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------
 Finalize Aggregate  (cost=59193.29..59193.30 rows=1 width=8) (actual time=1186.086..1211.660 rows=1 loops=1)
   ->  Gather  (cost=59193.07..59193.28 rows=2 width=8) (actual time=1185.972..1211.555 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=58193.07..58193.08 rows=1 width=8) (actual time=995.269..995.272 rows=1 loops=3)
               ->  Nested Loop  (cost=0.42..58190.98 rows=837 width=8) (actual time=10.212..992.982 rows=1858 loops=3)
                     ->  Parallel Seq Scan on addresses a1  (cost=0.00..51272.61 rows=1472 width=8) (actual time=10.125..929.565 rows=1960 loops=3)
                           Filter: (((geom)::geography && '0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography) AND ('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geograp
hy && _st_expand((geom)::geography, '25000'::double precision)) AND _st_dwithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, (geom)::geography, '25000'::double precision
, true))
                           Rows Removed by Filter: 86378
                     ->  Index Scan using idx_events_addresses on events e0  (cost=0.42..4.69 rows=1 width=16) (actual time=0.024..0.026 rows=1 loops=5879)
                           Index Cond: (physical_address_id = a1.id)
 Planning Time: 3.143 ms
 Execution Time: 1211.964 ms
(13 rows)

On PG13:

                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1643039.16..1643039.17 rows=1 width=8) (actual time=3537.701..3574.439 rows=1 loops=1)
   ->  Gather  (cost=1643038.94..1643039.15 rows=2 width=8) (actual time=3535.611..3574.404 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=1642038.94..1642038.95 rows=1 width=8) (actual time=3385.433..3385.437 rows=1 loops=3)
               ->  Nested Loop  (cost=0.42..1642038.93 rows=6 width=8) (actual time=86.443..3382.515 rows=1858 loops=3)
                     ->  Parallel Seq Scan on events e0  (cost=0.00..31114.85 rows=62785 width=16) (actual time=0.090..290.022 rows=50233 loops=3)
                     ->  Index Scan using addresses_pkey on addresses a1  (cost=0.42..25.66 rows=1 width=8) (actual time=0.057..0.057 rows=0 loops=150700)
                           Index Cond: (id = e0.physical_address_id)
                           Filter: st_dwithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, (geom)::geography, '25000'::double precision, true)
                           Rows Removed by Filter: 1
 Planning Time: 3.122 ms
 Execution Time: 3574.765 ms
(13 rows)

We see that PG11 and PG13 take opposite approaches – parallel scan on addresses then index scan on events, versus parallel scan on events then index scan on addresses – and that PG11 wins.

I checked that tables have the same characteristics (row count, indices). And run ANALYZE VERBOSE; on both and rerun the query. Same results.

Any hints ?

algorithm – Python: lru_cache make leetcode’s dungeon game more slower

I’m trying to implement a recursive+momoize version for leetcode’s dungeon game question.

I tried to use @lru_cache():

________________________________________________________
Executed in   14.22 secs    fish           external
   usr time   14.13 secs  101.00 micros   14.13 secs
   sys time    0.04 secs  498.00 micros    0.04 secs

And comment it to make it unavailable:

________________________________________________________
Executed in   11.73 secs    fish           external
   usr time   11.65 secs  123.00 micros   11.65 secs
   sys time    0.04 secs  556.00 micros    0.04 secs

It sounds a lot like the lru_cache won’t help in this case, just wondering if anything that I’d missed?

from typing import List
from functools import lru_cache

class Solution:
    def calculateMinimumHP(self, dungeon: List(List(int))) -> int:
        height = len(dungeon)
        width = len(dungeon(0))

        @lru_cache()
        def helper(start_x, start_y, acc, min):
            cur = dungeon(start_y)(start_x)

            acc = cur + acc
            if cur < 0 and acc < min:
                min = acc

            if start_x == width - 1 and start_y == height - 1:
                return min

            if start_x < width - 1:
                right_res = helper(start_x+1, start_y, acc, min)
            else:
                right_res = float("-inf")

            if start_y < height - 1:
                down_res = helper(start_x, start_y+1, acc, min)
            else:
                down_res = float("-inf")

            ret = max(down_res, right_res)
            return ret


        res = helper(0,0,0,0)
        return 1 if res > 0 else abs(res)+1

def main():
    sol = Solution()
    long_case = ((2,-8,-79,-88,-12,-87,-5,-56,-55,-42,18,-91,1,-30,-36,42,-96,-26,-17,-69,38,18,44,-58,-33,20,-45,-11,11,15,-40,-92,-62,-51,-23,20,-86,-2,-90,-64,-100,-42,-16,-55,29,-62,-81,-60,7,-5,31,-7,40,19,-53,-81,-77,42,-87,37,-43,37,-50,-21,-86,-28,13,-18,-65,-76),
(-67,-23,-62,45,-94,-1,-95,-66,-41,37,33,-96,-95,-17,12,30,-4,40,-40,-89,-89,-25,-62,10,-19,-53,-36,38,-21,1,-41,-81,-62,3,-96,-17,-75,-81,37,32,-9,-80,-41,-13,-58,1,40,-13,-85,-78,-67,-36,-7,48,-16,2,-69,-85,9,15,-91,-32,-16,-84,-9,-31,-62,35,-11,28),
(39,-28,1,-31,-4,-39,-64,-86,-68,-72,-68,21,-33,-73,37,-39,2,-59,-71,-17,-60,4,-16,-92,-15,10,-99,-37,21,-70,31,-10,-9,-45,6,26,8,30,13,-72,5,37,-94,35,9,36,-96,47,-61,15,-22,-60,-96,-94,-60,43,-48,-79,19,24,-40,33,-18,-33,50,42,-42,-6,-59,-17),
(-95,-40,-96,42,-49,-3,6,-47,-38,31,-25,-61,-18,-52,-80,-55,29,27,22,6,29,-89,-9,14,-77,-26,-2,-7,-2,-64,-100,40,-52,-15,-76,13,-27,-83,-70,13,-62,-54,-92,-71,-65,-18,26,37,0,-58,4,43,-5,-33,-47,-21,-65,-58,21,2,-67,-62,-32,30,-4,-46,18,21,2,-5),
(-5,34,41,11,45,-46,-86,31,-57,42,-92,43,-37,-9,42,-29,-3,41,-71,13,-8,37,-36,23,17,-74,-12,-55,-18,-17,-13,-76,-18,-90,-5,14,7,-82,-19,-16,44,-96,-88,37,-98,8,17,9,-2,-29,11,-39,-49,-95,20,-33,-37,-42,42,26,-28,-21,-44,-9,17,-26,-27,24,-60,-19))
    print(sol.calculateMinimumHP(long_case))


if __name__ == '__main__':
    main()

performance – Production database so much slower than development database?

I am running a script on both servers, one in my docker local developement system, and one on production. However, the same script is running much slower on production than on development. I am using MariaDB 10.5 on both. My developement system has 12 GB RAM while my production is 32 GB RAM. I temporary turned off nginx and ran the test at midnight where there were basically no request being made. It’s also the exact same database. I also ran these queries directly on MySQL console and still the speed was much slower.

This is in my development:
enter image description here

This is in my production:
enter image description here

I’m not asking for someone to solve this but how can I investigate this? What can be the possible culprits?

htop on development while running the script
enter image description here

htop on production while running the script
enter image description here

iotop had similar results.

I’m not asking for someone to solve this but how can I investigate this? What can be the possible culprits?

performance tuning – Why ParallelTable is even slower than Table for large tensor construction?

I would like to create a large rank-4 tensor by using both Table and ParallelTable.

What is actually constructed is given below:
$mathcal{L}_{m_1,m_2,V_1,V_2}=ileft(xi^*_2(m_1,v_1)-xi^*_2(m_2,v_2)right)V_d(m_1,v_1)V_d(v_2,m_2)$

where the function $xi_2(m,v;V)=ifrac{beta_2^2}{2beta_1^2}left(1-left(e^{frac{E_{v}-E_m-V/2}{kT}}+1right)^{-1}right)sqrt{4beta^2_1-(E_{v}-E_m-V/2)^2}$

and matrix elements $V_d(m,v)=langle m|vrangle=e^{-lambda^2/2Omega^2}sum_{i=0}^{nu}sum_{j=0}^{m}delta_{m-j,nu-i}(-1)^jleft(frac{lambda}{Omega}right)^{i+j}frac{1}{i!j!}sqrt{frac{m!nu!}{(m-j)!(v-i)!}}$

But I don’t know why ParallelTable didn’t give any improvement on the performance.

enter image description here
See the code below

Clear("Global`*")
Ns = 41; (*number of basis*)
Nb = 7;
V = 1.84;
(Lambda) = 0.3;
(CapitalOmega) = 0.5;
(Epsilon)0 = 0.5;
(Beta)1 = 1; (Beta)2 = 0.05; kT = 0.0259/300*10; (*10K*)
(*Franck-Condon factors*)

FK = E^(-(Lambda)^2/(2 (CapitalOmega)^2)) Table(!(
*UnderoverscriptBox(((Sum)), (i = 0), ((Nu)))(
*UnderoverscriptBox(((Sum)), (j = 
        0), (m))((KroneckerDelta(m - j, (Nu) - i))) 
*SuperscriptBox((((-1))), (j)) 
*SuperscriptBox(((
*FractionBox(((Lambda)), ((CapitalOmega))))), (i + j)) 
*FractionBox((1.), ((i!) (j!))) 
*SqrtBox(
FractionBox(((m!) ((Nu)!)), ((((m - j))!) ((((Nu) - 
             i))!)))))), {m, 0, Ns - 1}, {(Nu), 0, Ns - 1});
(*Eigenenergies of oscillator*)
Em = Table((CapitalOmega) (m + 1/2), {m, 0, Ns - 1});
(*Eigenenergies of shifted oscillator*)
E(Nu) = Table((CapitalOmega) ((Nu) + 1/
       2) + (Epsilon)0 - (Lambda)^2/(CapitalOmega), {(Nu), 0, 
    Ns - 1});
(*Define elementary functions*)
cf = Compile({{x, _Real}}, 
   If(x < -300., 1., If(x > 300., 0., 1./(1. + Exp(x)))), 
   CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
   Parallelization -> True, RuntimeOptions -> "Speed");
c(CapitalGamma)0 = 
  Compile({{E, _Real}, {qV, _Real}, {(Beta)1, _Real}, {(Beta)2, 
_Real}}, If(
    Abs(E - qV/2.) <= 
     2 (Beta)1, (Beta)2^2/(Beta)1^2 Sqrt(
      4 (Beta)1^2 - (E - qV/2.)^2), 0.), CompilationTarget -> "C", 
   RuntimeAttributes -> {Listable}, Parallelization -> True, 
   RuntimeOptions -> "Speed");
(Xi)2L(E_, V_) := 
  I/2 (1 - cf((E - V/2)/kT)) c(CapitalGamma)0(E, 
    V, (Beta)1, (Beta)2);
(Xi)2Lmv = 
Table((Xi)2L(E(Nu)((v + 1)) - Em((m + 1)), V), {m, 0, Ns - 1}, {v,
 0, Ns - 1});
Table( (Conjugate((Xi)2Lmv((m1, 
       V1))) - (Xi)2Lmv((m2, V2))) (FK((m1, V1)) FK((m2, V2)))
   , {m1, 1, Ns}, {V1, 1, Ns}, {m2, 1, Ns}, {V2, 1, 
    Ns}); // AbsoluteTiming
ParallelTable( (Conjugate((Xi)2Lmv((m1, 
       V1))) - (Xi)2Lmv((m2, V2))) (FK((m1, V1)) FK((m2, V2)))
   , {m1, 1, Ns}, {V1, 1, Ns}, {m2, 1, Ns}, {V2, 1, 
    Ns}); // AbsoluteTiming

linux – Why gets the R ShinyServer running slower?

I have running some shiny apps on Shiny Server version 1.5.3.838 – libc6 (2 2.7) r-base (2 2.15.1) and R version 3.4.4 (2018-03-15) and Ubuntu version 16.04.7 LTS.

Last time the server is extremely slow. Now and than I do:

$ sudo systemctl restart shiny-server

then the apps are running faster, but after a while they run slow again.

Is there any solution for this? Or is there a strategy to follow in order to find the cause of this?

Project Euler #12 in Julia slower than Python?

My code in Julia, almost identical as the Python code (see below), runs in 4.6 s while the Python version runs in 2.4 s. Obviously there is a lot or room for improvement.

function Problem12()
    #=
     The sequence of triangle numbers is generated by adding the natural
    numbers. So the 7th triangle number would be:
    1 + 2 + 3 + 4 + 5 + 6 + 7 = 28.

    The first ten terms would be:
    1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...

    Let us list the factors of the first seven triangle numbers:

     1: 1
     3: 1,3
     6: 1,2,3,6
    10: 1,2,5,10
    15: 1,3,5,15
    21: 1,3,7,21
    28: 1,2,4,7,14,28

    We can see that 28 is the first triangle number to have over five divisors.

    What is the value of the first triangle number to have over five hundred
    divisors?
    =#

    function num_divisors(n)
        res = floor(sqrt(n))
        divs = ()
        for i in 1:res
            if n%i == 0
                append!(divs,i)
            end
        end
        if res^2 == n
            pop!(divs)
        end
        return 2*length(divs)
    end

    triangle = 0
    for i in Iterators.countfrom(1)
        triangle += i
        if num_divisors(triangle) > 500
            return string(triangle)
        end
    end
end

Python version below:

import itertools
from math import sqrt, floor


# Returns the number of integers in the range (1, n) that divide n.
def num_divisors(n):
    end = floor(sqrt(n))
    divs = ()
    for i in range(1, end + 1):
        if n % i == 0:
            divs.append(i)
    if end**2 == n:
        divs.pop()
    return 2*len(divs)


def compute():
    triangle = 0
    for i in itertools.count(1):
        # This is the ith triangle number, i.e. num = 1 + 2 + ... + i =
        # = i*(i+1)/2
        triangle += i
        if num_divisors(triangle) > 500:
            return str(triangle)

mysql – Why is limit 0,1 slower than limit 0, 17

I’m trying to analyze why the following query is slower with LIMIT 0,1 than LIMIT 0,100

I’ve added SQL_NO_CACHE for testing purposes.

Query:

 SELECT 
  SQL_NO_CACHE  SQL_CALC_FOUND_ROWS wp_posts.*, 
  low_stock_amount_meta.meta_value AS low_stock_amount
FROM 
  wp_posts 
  LEFT JOIN wp_wc_product_meta_lookup wc_product_meta_lookup ON wp_posts.ID = wc_product_meta_lookup.product_id 
  LEFT JOIN wp_postmeta AS low_stock_amount_meta ON wp_posts.ID = low_stock_amount_meta.post_id 
  AND low_stock_amount_meta.meta_key = '_low_stock_amount' 
WHERE 
  1 = 1 
  AND wp_posts.post_type IN ('product', 'product_variation') 
  AND (
    (wp_posts.post_status = 'publish')
  ) 
  AND wc_product_meta_lookup.stock_quantity IS NOT NULL 
  AND wc_product_meta_lookup.stock_status IN('instock', 'outofstock') 
  AND (
    (
      low_stock_amount_meta.meta_value > '' 
      AND wc_product_meta_lookup.stock_quantity <= CAST(
        low_stock_amount_meta.meta_value AS SIGNED
      )
    ) 
    OR (
      (
        low_stock_amount_meta.meta_value IS NULL 
        OR low_stock_amount_meta.meta_value <= ''
      ) 
      AND wc_product_meta_lookup.stock_quantity <= 2
    )
  ) 

ORDER BY 
  wp_posts.ID DESC 
LIMIT 
  0, 1

Explains shows the exact same output

1   SIMPLE  wp_posts    index   PRIMARY,type_status_date    PRIMARY 8   NULL    27071   Using where
1   SIMPLE  low_stock_amount_meta   ref post_id,meta_key    meta_key    767 const   1   Using where
1   SIMPLE  wc_product_meta_lookup  eq_ref  PRIMARY,stock_status,stock_quantity,product_id  PRIMARY 8   woocommerce-admin.wp_posts.ID   1   Using where

The average query time is 350ms with LIMIT 0,1

The average query time is 7ms with LIMIT 0,100

The query performance gets faster starting with LIMIT 0,17

I’ve added another column to the order by clause as suggested in this question, but that triggers Using filesort in the explain output

Order by wp_posts.post_date, wp_posts.ID desc

1   SIMPLE  wp_posts    ALL PRIMARY,type_status_date    NULL    NULL    NULL    27071   Using where; Using filesort
1   SIMPLE  low_stock_amount_meta   ref post_id,meta_key    meta_key    767 const   1   Using where
1   SIMPLE  wc_product_meta_lookup  eq_ref  PRIMARY,stock_status,stock_quantity,product_id  PRIMARY 8   woocommerce-admin.wp_posts.ID   1   Using where

Is there a way to work around it without altering indices and why is this happening?

It’s also interesting that the query time improves starting with LIMIT 0,17. I’m not sure why 17 is a magic number here.

Update 1: I just tried adding FORCE INDEX(PRIMARY) and now LIMIT 0,100 has the same performance as LIMIT 0,1 smh