Welcome, Guest
Username: Password: Remember me

TOPIC: Problem launching a T3D case on a Linux cluster

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22039

  • kingja.x
  • kingja.x's Avatar
I use the redhatopenmpi configuration so have included -c redhatopenmpi in the jobscript.

Unfortunately when I try and include the line 'sys.path...' in runcode.py I get the following error:

Traceback (most recent call last):
File "runcode.py", line 127, in <module>
sys.path.append(path.join( path.dirname(sys.argv[0]), r'/scratch/sce9jak/opentelemac/tags/v7p1r0/scripts/python27'))
NameError: name 'path' is not defined

If I submit the jobscript from the Telemac Python directory however (in my case /scratch/sce9jak/opentelemac/tags/v7p1r0/scripts/python27) everything seems okay and the job starts to run.

A folder (t2d_tests_channel.cas_2016-06-15-15h07min24s)is created in the directory containing the .cas file to which a number of files are copied.

However I come across another problem (see attached error and line 78 in the log file):

mpiexec_raven198: cannot connect to local mpd (/var/tmp/pbs.352518.raven0/mpd2.console_raven198_sce9jak); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
_____________
runcode::main:
:
|runCode: Fail to run
|mpiexec -wdir /scratch/sce9jak/opentelemac/jobs/v7p1r0/examples/tests_channel/t2d_tests_channel.cas_2016-06-15-15h07min24s -n 4 /scratch/sce9jak/opentelemac/jobs/v7p1r0/examples/tests_channel/t2d_tests_channel.cas_2016-06-15-15h07min24s/out_telemac2d

After a quick search I've found that mpd is related to MPICH2. When I compiled TELEMAC I used an mpi intel module rather than the MPICH2 module. I'm guessing this is the cause of my problem and I should re-compile with MPICH2?

Many thanks
Jon
Attachments:
The administrator has disabled public write access.

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22042

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Dear Jon,

I advice you to use the default mpi of your system:

which mpirun
mpirun --version


Recompile both telemac and METIS with that one (recommended: using the mpi's wrapper to fortran/cc) and edit your cfg file to include the proper dependencies or commands-calls accordingly just to avoid some future extra tweaking (or requests to your cluster's admin) .

Regards,

José D.
The administrator has disabled public write access.
The following user(s) said Thank You: kingja.x

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22040

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Dear Jon,

In a similar fashion to Clemens response, in the office Ubuntu linux workstations I've added an alias in the bashrc profile (It may need some admins authorization) to add the paths of telemac to get a more comfortable usage.
locate bashrc

Edit the bashrc profile, using the output of locate
gedit /path/to/profile/.bashrc

Add opentelemac's python scripts to your path, in your case:
export PATH="/scratch/sce9jak/opentelemac/tags/v7p1r0/scripts/python27":$PATH

Create an alias for the SYSTELCFG (e.g. tele-ompi) (if you plan to use multiple telemacs builds or configs)
alias tele-ompi="export SYSTELCFG='/scratch/sce9jak/opentelemac/tags/v7p1r0/configs/redhatopenmpi.cfg'"

Save the bashrc file and close it and then relog your session or "source /path/to/profile/.bashrc" in the terminal.

Try your setup:
tele-ompi #this should be written every time you open a new terminal and want to use this cfg
gedit $SYSTELCFG #this should test the open your cfg file, close it
telemac2d.py /scratch/sce9jak/opentelemac/tags/v7p1r0/examples/telemac2d/bumpflu/bumpflu.cas

I remember that telemac's py scripts produced an empty output in my machines, if this happens to you, edit every single *.py file in the scripts folder and add at the most top of the file and add the shebang instruction which tells linux to use the "env python":
#!/usr/bin/env python

I hope this advice is helpful to you, regards

José D.
The administrator has disabled public write access.

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22041

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hello Jon,

Maybe you added the line above this line:
from os import path,walk,mkdir,chdir,remove,sep,environ,listdir,getcwd
I don't have another explanation.

Before doing that you should be really sure with which configuration you compiled the Telemac, which mpi the cluster is supporting and so on.
Actually if openMPI is installed at your cluster and you want to drive Telemac with openMPI then you don't need mpd.
To deal with troubles in the parallel processing on the cluster is beyond my knowledge and this can be really annoying, so I would recommend to ask your administrator.

Best regards,
Clemens
The administrator has disabled public write access.
The following user(s) said Thank You: kingja.x

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22043

  • kingja.x
  • kingja.x's Avatar
Thank you both, your advice is greatly appreciated.

I've compiled both Telemac and METIS using the default mpi verison; mpi/intel.4.1.0.

The admin helped me edit the config file so I'm sure they included the necessary dependencies etc.

I think at this point I need to have another look to try and uncover the cause of the latest problem or speak to the admin rather than bombard the forum.

Thanks again!
Jon
The administrator has disabled public write access.

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22055

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hello Jon,

another thing which comes in my mind, since we had similar troubles in the past: in your configuration file you can try to use as launcher mpirun instead of mpiexec.

Best regards,
Clemens
The administrator has disabled public write access.
The following user(s) said Thank You: kingja.x

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22061

  • kingja.x
  • kingja.x's Avatar
Hi Clemens,

I will give that a go as well. Also you were correct, once I inserted the line after:

from os import path,walk,mkdir,chdir,remove,sep,environ,listdir,getcwd

Python was able to find the Telemac Python directory and I could submit my job from the working directory containing the .cas file rather than .../v7p1r0/scripts/python27

Many thanks
Jon
The administrator has disabled public write access.

Problem launching a T3D case on a Linux cluster 8 years 5 months ago #22077

  • kingja.x
  • kingja.x's Avatar
In case anyone else has a similar problem, replacing mpiexec with mpirun in the config file and recompiling has worked for me.

I am now able to produce results when running the example job tests_channel. I am in the process of running other examples to see if any more problems arise.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.