If you run your computations and the number of prisms is e.g. less than 1,000 per subdomain, the parallel run may be not efficient and also be wrong sometimes.
In your case, you are far from this case. If it is possible (and depending on the cluster you use), my classical estimation of good number of cores is 10,000 elements (triangles in 2D, prisms in 3D) per subdomain/core, sometimes a little bit less (e.g. 5,000). If possible, I would try 720 cores or more with your set up (338,810 2D elements x 22 layers).
Hope this helps,
Chi-Tuan