Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: mpich error

mpich error 13 years 8 months ago #1214

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Dear All

My question is about parallel libraries and their versions. I am asking if anyone has experience of an error I am getting when running makepar90.

I have installed the parallel libraries and their dependencies which on CentOS_64 were:

MPI -> BLAS -> BLACS -> ScaLAPACK -> PT-SCOTCH ParMetis -> MUMPS -> telemac.

In each case the libraries used are the latest and have compiled successfully.

When compiling the telemac parallel executables I am getting an error against an mpich2 library (libmpich.a) where some of the functions contain undefined references.
My only explanation for this was that there are version incompatibilities?

Does anyone have knowledge of mpich2 versions (or other mpi implementations) which do compile successfully to telemac parallel executables?

Thanks
John
The administrator has disabled public write access.

Re:mpich error 13 years 8 months ago #1217

  • jmhervouet
  • jmhervouet's Avatar
Hello Jon,

Can you tell us what are the undefined references ? We have already been told that some mpi functions of version MPICH could raise problems on some machines when used with MPICH2 and we are currently correcting this for our next version. Another possibility would be using OpenMpi, this is done here by some of us and it works.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re:mpich error 13 years 8 months ago #1218

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Jean Michel
An example of the messages I have under the telemac2d compilation are:
/export/apps/mpich2/lib/libmpich.a(init.o): In function `MPI_Init':
init.c:(.text+0x38): undefined reference to `MPL_env2str'
init.c:(.text+0x55): undefined reference to `MPL_env2bool'
/export/apps/mpich2/lib/libmpich.a(initthread.o): In function `MPI_Init_thread':
initthread.c:(.text+0x421): undefined reference to `MPL_env2bool'
/export/apps/mpich2/lib/libmpich.a(param_vals.o): In function `MPIR_Param_init_p
arams':
param_vals.c:(.text+0xf): undefined reference to `MPL_env2int'
param_vals.c:(.text+0x27): undefined reference to `MPL_env2int'
param_vals.c:(.text+0x3f): undefined reference to `MPL_env2int'
param_vals.c:(.text+0x57): undefined reference to `MPL_env2int'
param_vals.c:(.text+0x6f): undefined reference to `MPL_env2int'

These are just a sample - there are many more and similar for the other modules.
Can you suggest an mpi implementation which would be better?
(I don't have mumps installed yet but probably don't need it).

Thanks
John
The administrator has disabled public write access.

mpich/libmetis error 13 years 8 months ago #1230

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Hi

Following on from this thread I have compiled and installed openmpi and have recompiled the metis library against openmpi also.

I still have some errors showing undefined reference when I try to compile the parallel executables.

For example:
/telemac/parallel/parallel_v6p0/intel64/parallelv6p0.a(p_dmax.o): In function `p_dmax_':
p_dmax.f:(.text+0x34): undefined reference to `mpi_allreduce_'


Looking at the functions being called, it seems as if there are functions in both libmetis and libparmetis which are required - but I can only compile against one of these in the makefile?
If there are are any views on which version of metis or parmetis to use to compile against openmpi, or any other way to configure the parallel build, they would be greatfully received...!

Regards

John
The administrator has disabled public write access.

Re:mpich/libmetis error 13 years 8 months ago #1240

  • ails
  • ails's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 17
Hello Jon,

You should check the MPI settings in the systel.ini file. Can you tell us your MPI settings (FC_MPI, LIBS_MPI...)?

OpenMPI needs special flags and we only gave MPICH flags in the default configuration file.

1/ Please check that mpif90 is the one from OpenMPI : which mpif90.
2/ Information for LIBS_MPI can be obtained by : mpif90 –showme:link
3/ Apply changes with cfgmak and maktel install in the parallel directory

Hope it will solve your problem.

Fabien

PS : With MPICH2, the sequence is as follows: mpif90 –show
PS : Only limetis.a (from metis 4.0) is required
The administrator has disabled public write access.

Re:mpich/libmetis error 13 years 8 months ago #1245

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Dear Fabien

Thank-you - using the correct LIBS_MPI options has solcved the linking problem and I have parallel executables compiled. :)

I have a further error when I try a parallel run - which is an MPI launcher error. :(

The output I get is shown below
________________________________________________
*** MPI MACHINE ***
MPI machine ok (with 16 processors).
________________________________________________
*** RUNNING ***

MPI launcher : /export/apps/mpich2/bin/mpirun -machinefile mpirun.txt -np 16 out28473_intel64.exe
[proxy:0:0@deepblue.corp.cefas.co.uk] HYDU_create_process (./utils/launch/launch.c:69): execvp error on file out28473_intel64.exe (No such file or directory)

The error is repeated for each processor.

The error I think is one generated by hydra - which is trying to create proxy users for each processor (node) - but fails in calling the ./utils/launch/launch.c to do so.

Could you possibly advise me what output I should expect here? It may help me find the problem. Are my RUN_MPI options correct?

John
The administrator has disabled public write access.

Re:mpich/libmetis error 13 years 8 months ago #1251

  • ails
  • ails's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 17
Dear John,

I'm a bit confused. Did you compile parallel with the OpenMPI or MPICH library?

I gave you advice for OpenMPI but MPICH2 appears in your launching sequence:
/export/apps/mpich2/bin/mpirun -machinefile mpirun.txt...

As I didn't see any attached file, I can only suggest a check list:

1/ Please check your PATH (which mpirun) and the RUN_MPI line. It is not necessary to write the entire path to mpirun as long as it is correctly defined in your PATH.

2/ Please check that your mpirun accepts the machinefile option. I think that OpenMPI (only?) accepts -hostfile mpirun.txt.

3/ More likely : please check that out28473_intel64.exe exists and have (at least) executable permission. Try then to run:
/export/apps/mpich2/bin/mpirun -machinefile mpirun.txt -np 16 ./out28473_intel64.exe.
Export the "." directory in your PATH.

Best regards,

Fabien
The administrator has disabled public write access.

Re:mpich/libmetis error 13 years 8 months ago #1253

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Dear Fabien

Sorry - I was not completely clear. I compiled the parallel binaries/executables successfully agaist mpich2 after you showed me the correct options for systel.ini. I prefer to use mpich2 rather than Openmpi if possible.

Checking the points you raised:
1. which mpirun and RUN_MPI match.
2. mpirun can take the machinefile option - but doesn't need it necessarily. The mpirun.txt file is created in the run directory. I have tried by including it and also without it.
3. The out#####_intel64.exe file is created in the run directory and is executable by all. I still get an error when I run the command directly as you suggest.

I have also tried to set the command as per the hydra options with RUN_MPI as
mpiexec -f hosts -n <N> <EXE>

The path to hosts (list of hostnames) is also set as an environment variable.

This produces the same error.
MPI machine ok (with 16 processors).
______________________________________________________________________________
*** RUNNING ***

MPI launcher : /export/apps/mpich2/bin/mpiexec -f hosts -n 16 out31320_intel64.exe
[proxy:0:0@deepblue.corp.cefas.co.uk] HYDU_create_process (./utils/launch/launch.c:69): execvp error on file out31320_intel64.exe (No such file or directory)


There must be a path issue somewhere in the mpich2 setup.
I've attached my systel.ini and a file with the current PATH.? (hopefully!)

John
The administrator has disabled public write access.

Re:mpich/libmetis error 13 years 8 months ago #1254

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Dear John

This is not a solution but maybe a part of it.
Did you try to run your simulation in parallel with only 1 processor?
This is similar to a scalar computation but with all the parallel configuration.
This could help us to determine if your problem comes from MPICH2 or not

Regards
Christophe
The administrator has disabled public write access.

Re:mpich/libmetis error 13 years 8 months ago #1255

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Hi

Sorry - could you clarify how to do that please?
Set parallel processors = 0 in steering file?
Which command would I use to run the parallel version?

Regards
John
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.