I have encounter an issue when running Telemac3d on the University of Edinburgh Archer2 HPC (CRAY architecture) across multiple nodes.
My models run without error when using all cores on a single node (128 cores/node), but when I partition across mutliple nodes I get an FPE error that traces back to the MPI_ALLTOALLV call for the streamline solver when using the method of characteristics to solve the advection equations.
The same models run succesfully on the University of Edinburgh CIRRUS HPC system across multiple nodes.
If I change the solver the issue goes away, so it is related in some way to the streamline solver and how the message passing is behaving.
Has anyone else encountered this problem running Telemac3d on a CRAY architecture?
Are there any quirks related to the streamline solver that could lead to an FPE error in the MPI_ALLTOALLV call?
The attached file gives an example of the backtrace generated by the error.
Thanks,
Chris