Welcome, Guest
Username: Password: Remember me

TOPIC: problem with parallel mode

problem with parallel mode 10 years 10 months ago #11563

  • NHEILI
  • NHEILI's Avatar
Hello
I have problem in the mode parallel i see that it don't caal the function P_INIT that make a master proc and i have multiply random listing result.

at the begining of the simulation i have this message:
" +> /home/nheili/v6p3r1/builds/ubugfopenmpi/bin/partel < PARTEL.PAR >> partel_T2DGEO.log
STOP 0
partitioning: T2DREF
+> /home/nheili/v6p3r1/builds/ubugfopenmpi/bin/partel < PARTEL.PAR >> partel_T2DREF.log
STOP 0 "
at the end i have :
"runCAS: I could not copy the output files back from the temporary directory:
/home/nheili/v6p3r1/examples/telemac2d/malpasset/t2d_malpasset-small.cas_2014-01-15-11h49min00s
|processECR: could not find the listing file: PE00001-00001.LOG
"
Anyone know what happens please
Regards
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11566

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

What version of metis are you using ?

Could you post here the file partel_T2DGEO.log or partel_T2DREF.log
They should be in the temporary folder.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11568

  • NHEILI
  • NHEILI's Avatar
thank you
i am using metis-5.0.2
and that all my files
Attachments:
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11569

  • NHEILI
  • NHEILI's Avatar
Running your simulation :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/usr/bin/mpiexec.openmpi -wdir /home/nheili/v6p3r1/examples/telemac2d/malpasset/t2d_malpasset-small.cas_2014-01-15-13h59min17s -n 2 /home/nheili/v6p3r1/examples/telemac2d/malpasset/t2d_malpasset-small.cas_2014-01-15-13h59min17s/out_t2d_malpasset-small


MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 2
EXECUTABLE FILE: /home/nheili/v6p3r1/examples/telemac2d/malpasset/t2d_malpasset-small.cas_2014-01-15-13h59min17s/A.EXE
BARRIER PASSED
BACK FROM P_INIT

LISTING DE TELEMAC-2D

TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC

2D VERSION 6.3 FORTRAN 90
WITH SEVERAL TRACERS
COUPLED WITH SISYPHE AND TOMAWAC

FIN DU FICHIER POUR DAMOCLES

********************************************
* LECDON: *
* APRES APPEL DE DAMOCLES *
* VERIFICATION DES DONNEES LUES *
* SUR LE FICHIER DES PARAMETRES *
********************************************

SORTIE DE LECDON. TITRE DE L'ETUDE :
Le barrage de MALPASSET

OUVERTURE DES FICHIERS POUR TELEMAC2D

*****************************
* ALLOCATION DE LA MEMOIRE *
*****************************

LIT : FIN DE FICHIER ANORMALE
ON VOULAIT LIRE UN
ENREGISTREMENT DE 72 VALEURS
DE TYPE : CH
SUR LE CANAL : 1

PLANTE : ARRET DU PROGRAMME APRES ERREUR
_____________
runcode::main:
/home/nheili/v6p3r1/examples/telemac2d/malpasset:
|runCode: Fail to run
|/usr/bin/mpiexec.openmpi -wdir /home/nheili/v6p3r1/examples/telemac2d/malpasset/t2d_malpasset-small.cas_2014-01-15-13h59min17s -n 2 /home/nheili/v6p3r1/examples/telemac2d/malpasset/t2d_malpasset-small.cas_2014-01-15-13h59min17s/out_t2d_malpasset-small
|~~~~~~~~~~~~~~~~~~
|CALLING P_INIT
| CALLING P_INIT
|STOP 2
|STOP 2
|
|Backtrace for this error:
| + function plante_ (0x8E54FE)
| + function lit_ (0x66FE14)
| + function readgeo1_ (0x70F24A)
| + function almesh_ (0x670CA7)
| + function point_telemac2d_ (0x4289E6)
| + in the main program
| from file homere_telemac2d.f
| + /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f8c6103376d]
|
|mpiexec.openmpi has exited due to process rank 1 with PID 27122 on
|node nheili exiting without calling "finalize". This may
|have caused other processes in the application to be
|terminated by signals sent by mpiexec.openmpi (as reported here).
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11570

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Partel seems to be compiled in serial.
Do you have the -DHAVE_MPI option in your configuration file.
If you do not add it in the cmd_obj.

Then you should recompile telemac :
compileTELEMAC.py --clean.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
The following user(s) said Thank You: NHEILI

problem with parallel mode 10 years 10 months ago #11576

  • NHEILI
  • NHEILI's Avatar
Thank you.That work!
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11683

  • NHEILI
  • NHEILI's Avatar
Hello,
The mode paralel work well but if the number of processors=1
the run block after the validation procedure (the boucle is not finished ).Anyone can help me?
Regards
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11684

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
HI,

I have similar problem as yours. I can run only on 1 processor in parallel. The minute I increase it to higher I get segmentation fault error. I looked at forum and saw something interesting that might fix the problem but I have not tried yet as I already have 5 cases running on 1 processor and should end in couple of days and then I will try to make the change and recompile the whole system. The topic I mentioned is :

www.opentelemac.org/index.php/kunena/12-...o-metis-partmeshdual

Basically it is suggested to change IDXTYPEWIDTH line from 64 to 32 in metis.h file and recompile the metis and then recompile whole telemac. Currently I have 64 so I will wait till my cases finish and change to 32 and compile. Seems like that sorted the user's problem, maybe it will help me too. Do you have 64 or 32 in your metis.h file?

Kind Regards!

Violeta
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11687

  • NHEILI
  • NHEILI's Avatar
Hello,
Sorry i thing i wasn't clear.
The program run if the numbers of processors are greater than 1.When number of processors=1 the run block after the validation procedure (the boucle is not finished ).
Kind Regards,
The administrator has disabled public write access.

problem with parallel mode 10 years 10 months ago #11689

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Thanks for clarifications. I am searching too much for my solution everywhere and just quickly jumped to conclusion that yours one is same as mine:), so I hoped that if somebody could sort your problem that it should help me too. You are ahead of me. Good luck and hopefully somebody will come back with answer to your problems quickly.

Kind Regards!

Violeta
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.