I’m trying to optimize or parallelize the following code as it’s slow for larger n

```
projOuterAllAngles(n_,a_):=Block(
{ind = indexToRep /@ Range(n^2), limits = Table(1-k/(a-1/2),{k,0,a-1}), overlap, matrixoflists, listofmatrices},
overlap = Function({r1,r2},integralsForListofLimits(r1, r2, limits));
matrixoflists = Outer(overlap,ind,ind,1);
listofmatrices = Transpose(matrixoflists,{3,2,1});
MapThread(convertToProjection(#) & , {listofmatrices}))
```

It requires the following helper functions:

```
indexToRep(n_) := indexToRep(n) = Module({j,m},j=Floor(Sqrt(n-1));
m=n-1-j^2-j;
{j,m})
sphNormCoeffs(l_,m_):=Sqrt((2*l+1)*(l-m)!/(4*Pi*(l+m)!))
(* Outputs a list of length Length(limits) where each non-zero element is the result of a spherical harmonic integral with an upper limit given by the element of list limits *)
integralsForListofLimits(r1_?ListQ,r2_?ListQ,limits_) := integralsForListofLimits(r1, r2, limits) = integralsForListofLimits(r2, r1, limits) = Block(
{(Theta), (Phi), j1 = First(r1), m1 = Last(r1), j2 = First(r2), m2 = Last(r2)},
Return(If(Last(r1)!=Last(r2), ConstantArray(0.0, Length(limits)),
Subtract @@ (2*Pi*sphNormCoeffs(j2,m2)*sphNormCoeffs(j1,m1)*Integrate(LegendreP(j1,m1,x)LegendreP(j2,m2,x),x) /. x -> {1, limits})))
)
convertToProjection(mat_) := Block({u, w, v},
{u, w, v} = SingularValueDecomposition(mat);
u . DiagonalMatrix(If(Re(#)>0.5,1.,0.) & /@ Diagonal(w)) . ConjugateTranspose(v)
)
```

I tried replacing `Outer`

with `Parallelize @ Outer`

and similarly `MapThread`

with `Parallelize @ MapThread`

, but it was actually much slower on 8 parallel cores than just on a single one.

`projOuterAllAngles(n,a)`

outputs a list of dimensions `{a, n^2,n^2}`

and it basically computes a list of `a`

matrices of size n^2 by n^2. These matrices are block-diagonal with non-zero entries being the result of the integrals from `integralsForListofLimits`

.

Any help on how I can make this code faster would be much appreciated.