performance – How can I parallelize this code using OpenMP?


How can i parallelize this following code using openmp:

#include <iostream>
#include <math.h>
#include <vector>
#include <math.h> 
#include <cmath> 
#include <omp.h>

using namespace std;

int main()
{

    int i,j,k;
    int np = 1000000;
    int kmax = 100;
    double E(kmax+1) = {0.0};

    // need to be parallelized
    for (i = 0; i < np-1; i++){
        for (j = i+1; j < np; j++){

            double x = sin(double(2*i+j)/(3.0*np));//x,y,z,d1 for test.
            double y = sin(double(i+j)/(2.0*np));
            double z = sin(double(i*j)/np/np);
            double d1 = sqrt(x*x + y*y + z*z);

            for (k = 1; k <= kmax; k++){

                double d2 = k * d1;
                E(k) += x + y + sin(d2) / d2;
            }

        }
    }
    
    
    return 0;
}

I expect to receive some suggestions on parallel computing using OpenMP for E(k). The size of i and j loops are 1000 000 while that of k is 100. Some parameters, such as x, y, z, d1, are created for code structure. I desire that the calculation is designed well on a pc with 40 cores.
This code is compiled as g++ test.cpp -o run -fopenmp