I found this on stackoverflow:
"If you want to use srun with Intel MPI, an extra step is required. You first need to
export I_MPI_PMI_LIBRARY=/path/to/slurm/pmi/library/libpmi.so"
so I did that in my cfg and this got rid of the "Fatal error in PMPI_Init: Other MPI error, error stack".
srun -n 4 ./hello_world
node 2 : Hello world
node 1 : Hello world
node 3 : Hello world
node 0 : Hello world
So I added the line below to my config
export I_MPI_PMI_LIBRARY=/opt/software/slurm-20.11.0/lib/libpmi.so
You would think by now the issue would be solved...
Here is the final HPC_STDIN:
#!/bin/bash
#SBATCH --ntasks=28
#SBATCH --nodes=1
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=01:00:00
#SBATCH -o OK-%j.out # Write job output
#SBATCH -o OK-%j.err # Write job error
module restore telemacv8p2_modules
source /home/okurum/v8p1/configs/pysource.SikuAceNet.sh
config.py
export I_MPI_PMI_LIBRARY=/opt/software/slurm-20.11.0/lib/libpmi.so
srun -n 28 /home/okurum/test_v8p1/t2d_test.cas_2021-01-19-06h07min15s/out_user_fortran
exit
sq shows that the run is going but after the initial partitioning, nothing gets updated in /home/okurum/test_v8p1/t2d_test.cas_2021-01-19-06h07min15s
checking the logs, it seems the model hangs here. My tpxo data is there, the path to it is correct in my cas. I am not sure why it hangs like this.
*************************************
* END OF MEMORY ORGANIZATION: *
*************************************
INITIALIZING TELEMAC2D FOR
INBIEF (BIEF): NOT A VECTOR MACHINE (ACCORDING TO YOUR DATA)
FONSTR : FRICTION COEFFICIENTS READ IN THE
GEOMETRY FILE
STRCHE (BIEF): NO MODIFICATION OF FRICTION
NUMBER OF LIQUID BOUNDARIES: 2
CORFON (TELEMAC2D): NO MODIFICATION OF BOTTOM
INITIALISATION BASED ON TPXO: