Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Error code 2 in parallel mode

Error code 2 in parallel mode 7 years 3 months ago #27524

  • lpilavoine
  • lpilavoine's Avatar
Hello everyone,

I am currently running into another problem. I have absolutely no idea where that could come from.

Here's the context: I have a simulation with a single class of sediment (30 µm) transported by suspension only. So far, so good. What I'd like to do is to run the simulation with the V7P2 in parallel mode.

And that's where something goes wrong. When I run the simulation, V6P2, V7P2, it runs fine. But the second I try to launch it in parallel, I have this error:
PLANTE : ARRET DU PROGRAMME APRES ERREUR
RETURNING EXIT CODE:            2

job aborted:
rank: node: exit code[: error message]

And this thing only occurs when Telemac is coupled with Sisyphe, as it seems to run correctly all alone, even in parallel. (I didn't do the full simulation, as it is 5 000 000 seconds long...)

Does anyone know where that could come from? It really only happens when Sisyphe is on and I run it in parallel.

I have attached below my steering file for Sisyphe as well as the fortran file used.

Thanks in advance.
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27525

  • Phelype
  • Phelype's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 64
Hello,

Please post a longer chunk of the error message or upload all the necessary file to reproduce the error.

It is hard to tell with the information you provided.

Best regards,

Phelype
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27527

  • lpilavoine
  • lpilavoine's Avatar
Well, that's more or less the whole error message. What's after that is only the list of process with the first one saying "Process 0 exited without calling finalize" and "123" for the 7 others (see the whole image attached below)

I'm not sure I'm allowed to give access to the geometry file since this is a real river and a real dam (I'd have to check that out)...

I also thought that the fortran file could have played a part, but it looks like even without it, the simulation doesn't run far. A bit further (especially without the CONLIT subroutine setting the incoming sediment), but it still stops pretty quickly.
Attachments:
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27528

  • mafknaapen
  • mafknaapen's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 157
  • Thank you received: 62
When running in parallel, the system usually creates log files (.log) and a sortie file. These may contain additional information on the precise error.
Dr Michiel Knaapen
Senior Scientist
E This email address is being protected from spambots. You need JavaScript enabled to view it.
T +44 (0)1491 822399

HR Wallingford, Howbery Park, Wallingford, Oxfordshire OX10 8BA, United Kingdom
T +44 (0)1491 835381, F +44 (0)1491 832233
www.hrwallingford.com
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27532

  • Phelype
  • Phelype's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 64
It doesn't have to be the mesh of your study. The actual geometry of the mesh doesn't seem to be the problem here.

But what would be useful is a minimum working example that throws the exact same error. It would be a lot easier to debug, since there is no error message (above the "PLANTE : ARRET DU PROGRAMME APRES ERREUR").
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27535

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello,

Have you checked that the source subroutines used in your fortran file have not changed from v6p2 to v7p2? There maybe changes in-place that affect parallel run. If I wanted to be certain, I would take the v7p2 sources and re-adapt my changes for the v7p2 runs.

Costas
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27537

  • lpilavoine
  • lpilavoine's Avatar
Thank you all for your answers.


@mafknaapen: Well, the log files don't look like they give more info that what the shell is showing me... Still the same "error code 2" as the only piece of information. Is there a way to have more info displayed when getting an error?


@Phelype: I have tried to run several Sisyphe examples (with both a telemac2d and a Sisyphe steering file), and two out of three worked. The examples "canal_solide_discharge_inflow" and "littoral" both ran, but I ended up with the same error when I tried to run the "conservation" example.


@cyamin: I'll try that, but my case runs into an error of empty layers pretty quickly without the fortran subroutines (something that doesn't happen if I run the simulation without the parallel mode). I'll try to see if re-adapting the subroutines make the simulation works better though.
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27538

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
In my conlit.f from the trunk (but this change refers to v7p2) I have found the following comment for example:
!history J,RIEHME (ADJOINTWARE)
!+ November 2016
!+ V7P2
!+ Replaced EXTERNAL statements to parallel functions / subroutines
!+ by the INTERFACE_PARALLEL
That would definitely affect parallel run eventually. v6p2 is very old to keep working with v7p2. There are many changes that happened in functionality between those releases. I would be surprised if it worked out of the box.

Costas
The administrator has disabled public write access.
The following user(s) said Thank You: lpilavoine

Error code 2 in parallel mode 7 years 3 months ago #27539

  • lpilavoine
  • lpilavoine's Avatar
Oh, thanks for that!
The conlit subroutine in my Sisyphe source files doesn't look any different than the V6P2 one, but I'd like to try what's written in this comment.

Here's the only changed mentionned in my V7P2 version of the conlit.f file :
!history R. KOPMANN (BAW)
!+ 13/07/2016
!+ V7P2
!+ Integrating liquid boundary file for QS
!

I'll try yours.
Does that mean that I have to remove all EXTERNAL variables from the file and add an "USE INTERFACE_PARALLEL" at the start of the subroutines?
The administrator has disabled public write access.

Error code 2 in parallel mode 7 years 3 months ago #27540

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
OK, probable the change I mentioned came later than the version you have but in any case, I would strongly advise to take the source files from your installation and re-work every change you made so far in your v6p2 fortran file. Thus you will avoid not spotting smaller changes in the code.

Then maybe your code will need some adapting to potential new functionality but will arise while you make the changes or eventually in the computation.

Costas
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: Pablo, pavans

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.