query performance – In PostgreSQL, how can I make Bitmap Index Scan parallelized?

I want to improve the performance of an SQL statement.

I am on version 13.
Here are the sample codes and the query I am interested in.

drop table ords;
CREATE TABLE ords (
ORD_ID INT NOT NULL,
CUST_ID VARCHAR(10) NOT NULL,
ORD_DATE DATE NOT NULL,
ETC_CONTENT VARCHAR(100));

ALTER TABLE ords ADD CONSTRAINT ORDS_PK PRIMARY KEY(ORD_ID);

CREATE INDEX ORDS_X01 ON ORDS (CUST_ID);

INSERT INTO ORDS
SELECT i
      ,lpad(mod(i,1000)::text,10,'cust')
      ,date '2021-06-07'+mod(i,624)
      ,rpad('x',100,'x')
  FROM generate_series(1,1000000) a(i);

drop table delivery;

CREATE TABLE delivery (
ORD_ID INT NOT NULL,
VEHICLE_ID VARCHAR(10) NOT NULL,
START_DATE DATE NOT NULL,
END_DATE   DATE NOT NULL,
ETC_REMARKS VARCHAR(100));

INSERT INTO DELIVERY
SELECT i
     , MOD(i,1000)
     , date '2021-01-01' + mod(i,1000)
     , date '2021-01-05' + mod(i,1000)
     , rpad('x',100,'x')
  FROM generate_series(1,1000000) a(i);

ALTER TABLE DELIVERY ADD CONSTRAINT DELIVERY_PK primary key (ORD_ID);
CREATE INDEX DELIVERY_X01 ON DELIVERY(END_DATE, START_DATE);
CREATE INDEX DELIVERY_X02 ON DELIVERY(VEHICLE_ID);
select pg_relation_size('ords'), pg_relation_size('delivery');
analyze ords;
analyze delivery;
EXPLAIN(ANALYZE, BUFFERS, COSTS OFF)
SELECT A.*, B.*
  FROM ORDS A LEFT JOIN DELIVERY B
    ON (A.ORD_ID = B.ORD_ID
        AND (B.START_DATE <= DATE '2021-07-12' AND B.END_DATE >= DATE '2021-07-10'
             OR (B.VEHICLE_ID > '990')
             )
        )
 WHERE A.ORD_DATE BETWEEN DATE '2021-06-01' AND DATE '2021-07-10'
;

Below is the execution plan.

 Gather (actual time=86.645..101.685 rows=54501 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=13615 read=23995, temp read=1196 written=1272
   ->  Parallel Hash Left Join (actual time=83.360..87.135 rows=18167 loops=3)
         Hash Cond: (a.ord_id = b.ord_id)
         Buffers: shared hit=13614 read=23995, temp read=1196 written=1272
         ->  Parallel Seq Scan on ords a (actual time=0.047..34.335 rows=18167 loops=3)
               Filter: ((ord_date >= '2021-06-01'::date) AND (ord_date <= '2021-07-10'::date))
               Rows Removed by Filter: 315166
               Buffers: shared hit=4968 read=14263
         ->  Parallel Hash (actual time=42.999..42.999 rows=5333 loops=3)
               Buckets: 32768  Batches: 8  Memory Usage: 608kB
               Buffers: shared hit=8450 read=9732, temp written=280
               ->  Parallel Seq Scan on delivery b (actual time=0.069..40.615 rows=5333 loops=3)
                     Filter: (((start_date <= '2021-07-12'::date) AND (end_date >= '2021-07-10'::date)) OR ((vehicle_id)::text > '990'::text))
                     Rows Removed by Filter: 328000
                     Buffers: shared hit=8450 read=9732
 Planning:
   Buffers: shared hit=20
 Planning Time: 0.357 ms
 Execution Time: 103.282 ms

I had expected that two Bitmap Index Scans using the delivery_x01 and delivery_x02 would appear followed by the BitmapOr operation when fetching rows from the DELIVERY table. Unlike what I thought, the planner chose to do a table scan with the parallelism.

To compare the execution plan I expected with the plan PostgreSQL chose, I set the parameter max_parallel_workers_per_gather to 0 and re-ran the SQL statement.

set max_parallel_workers_per_gather = 0;

–I re-ran the query and here is the resulting execution plan.

Hash Right Join (actual time=100.080..119.375 rows=54501 loops=1)
   Hash Cond: (b.ord_id = a.ord_id)
   Buffers: shared hit=3304 read=18847, temp read=903 written=903
   ->  Bitmap Heap Scan on delivery b (actual time=1.374..4.277 rows=16000 loops=1)
         Recheck Cond: (((end_date >= '2021-07-10'::date) AND (start_date <= '2021-07-12'::date)) OR ((ve
hicle_id)::text > '990'::text))
         Heap Blocks: exact=2182
         Buffers: shared hit=2919
         ->  BitmapOr (actual time=1.108..1.109 rows=0 loops=1)
               Buffers: shared hit=737
               ->  Bitmap Index Scan on delivery_x01 (actual time=0.809..0.810 rows=7000 loops=1)
                     Index Cond: ((end_date >= '2021-07-10'::date) AND (start_date <= '2021-07-12'::date)
)
                     Buffers: shared hit=726
               ->  Bitmap Index Scan on delivery_x02 (actual time=0.298..0.298 rows=9000 loops=1)
                     Index Cond: ((vehicle_id)::text > '990'::text)
                     Buffers: shared hit=11
   ->  Hash (actual time=98.373..98.374 rows=54501 loops=1)
         Buckets: 32768  Batches: 4  Memory Usage: 2331kB
         Buffers: shared hit=384 read=18847, temp written=697
         ->  Seq Scan on ords a (actual time=0.122..85.072 rows=54501 loops=1)
               Filter: ((ord_date >= '2021-06-01'::date) AND (ord_date <= '2021-07-10'::date))
               Rows Removed by Filter: 945499
               Buffers: shared hit=384 read=18847
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.232 ms
 Execution Time: 120.843 ms

By using Bitmap Index Scan and BitmapOr operations I could drop the number of block I/Os, but the execution time increased from 103 ms to 120 ms. It seems that the parallelism is the main factor of the execution time gap. So I infer that if parallelism kicks in in the Bitmap Index Scan operation, the query would become faster.

Finally, My question is:
How can I make the Bitmap Index Scan operation parallelized?