Welcome, Guest
Username: Password: Remember me

TOPIC: Problems after exiting MPI / Simulation

Problems after exiting MPI / Simulation 3 years 8 months ago #37867

  • KMou
  • KMou's Avatar
Hello everyone,

I carried out some steady-state test simulations to calculate mixed suspended sediment transport in a reservoir using T2D and Gaia. However, the final .slf files are not written and the program crashes after exiting MPI. The results in the working directory seem to be alright and the log-files show no errors.

Do you have an idea concerning this problem or experienced sth. similar?
Can the problem be caused by an error in the steering file?
I appreciate any advice and have attached the working directory and steering files.
I had the following message in the terminal:
END OF TIME LOOP

 EXITING MPI

double free or corruption (!prev)

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
double free or corruption (!prev)

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x7f456933e8b0 in ???
#1  0x7f456933dae3 in ???
#2  0x7f45689b783f in ???
#3  0x7f45689b77bb in ???
#4  0x7f45689a2534 in ???
#5  0x7f45689f9507 in ???
#0  0x7fdcf6cea8b0 in ???
#1  0x7fdcf6ce9ae3 in ???
#2  0x7fdcf636383f in ???
#3  0x7fdcf63637bb in ???
#4  0x7fdcf634e534 in ???
#5  0x7fdcf63a5507 in ???
#6  0x7fdcf63abc19 in ???
#7  0x7fdcf63ad73b in ???
#8  0x556129fed494 in ???
#6  0x7f45689ffc19 in ???
#7  0x7f4568a0173b in ???
#8  0x55bfdead8494 in ???
#9  0x55bfde7f0025 in ???
#10  0x55bfde7bc42f in ???
#11  0x55bfde7b6842 in ???
#9  0x556129d05025 in ???
#10  0x556129cd142f in ???
#11  0x556129ccb842 in ???
#12  0x7fdcf635009a in ???
#13  0x556129ccb879 in ???
#14  0xffffffffffffffff in ???
#12  0x7f45689a409a in ???
#13  0x55bfde7b6879 in ???
#14  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node lww-034 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/modelling/telemac/v8p2/scripts/python3/telemac2d.py", line 7, in <module>
    main('telemac2d')
  File "/home/modelling/telemac/v8p2/scripts/python3/runcode.py", line 271, in main
    run_study(cas_file, code_name, options)
  File "/home/modelling/telemac/v8p2/scripts/python3/execution/run_cas.py", line 157, in run_study
    run_local_cas(my_study, options)
  File "/home/modelling/telemac/v8p2/scripts/python3/execution/run_cas.py", line 65, in run_local_cas
    my_study.run(options)
  File "/home/modelling/telemac/v8p2/scripts/python3/execution/study.py", line 612, in run
    self.run_local()
  File "/home/modelling/telemac/v8p2/scripts/python3/execution/study.py", line 440, in run_local
    run_code(self.run_cmd, self.sortie_file)
  File "/home/modelling/telemac/v8p2/scripts/python3/execution/run.py", line 182, in run_code
    raise TelemacException('Fail to run\n'+exe)
utils.exceptions.TelemacException: Fail to run
/usr/bin/mpiexec -wdir /home/IWS/mouris/Desktop/Model_tests/Gaia_Banja/Steady_state_plusSed4/run_steady_tel.cas_2021-02-22-10h33min32s -n 2 /home/IWS/mouris/Desktop/Model_tests/Gaia_Banja/Steady_state_plusSed4/run_steady_tel.cas_2021-02-22-10h33min32s/out_telemac2d

Best,

Kilian
Attachments:
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 8 months ago #37869

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Kilian
The END OF TIME LOOP message indicate the simulation reach the end without problems
So I suspect a problem at the result merge...
You could try to merge the result manually and check.
You could also have a look at the gretel.log file if it exists

Hope this helps
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: KMou

Problems after exiting MPI / Simulation 3 years 8 months ago #37874

  • KMou
  • KMou's Avatar
Thank you very much for the prompt reply!
I loaded the result files from the working directory into Blue Kenue and they are alright.
It seems like only the very last steps (merging, handling result files and deleting working dir) are not happening. I'm not sure if this is rather due to a stupid mistake in the .cas files or if there might be a config or hardware problem?

I did not experience this problem with Telemac only simulations and the Gaia examples (Hippodrome and mud-conservation) worked as well.
The gretel.log file does not exist. There are partel.logs which all indicate "NORMAL TERMINATION".
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 8 months ago #37875

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Is there a gretel.par file in the working directory?

If it's working on other test case, this is probably not o problem of a mistake in the .cas file nor a config problem but more a hardware problem.
Running the merge manually with your case could allow you to check this point

Regards
Christophe
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 8 months ago #37881

  • KMou
  • KMou's Avatar
Thanks for your help!

There was no gretel.par file in the working directory.
So I guess the execution stopped before gretel.py was called?

I just followed your recommendations and run gretel.py manually which worked perfectly.

gretel.py --geo-file=T2DGEO --res-file=GAIRES --ncsize=2 --bnd-file=T2DCLI

This is a viable solution, although I don't know exactly why the problem occurs in this case.

Regards

Kilian
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 7 months ago #38180

  • KMou
  • KMou's Avatar
Hello everyone,

Unfortunately, I have not yet been able to fix the problem. Since the simulation itself works, it's hard to find potential errors in the steering files (if this could be a possible reason?) and to traceback errors. The manual merging works but is not optimal.

I noticed that the problem only occurs when simulating suspended sediment transport (SUSPENSION FOR ALL SANDS = YES). Do you have any idea what this could be or experienced sth. similar? Further, the problem occurs even when I set the inflow concentrations to 0.

I am grateful for any advice.

Regards,

Kilian

I just run gdb which gave me the following:
END OF TIME LOOP




  
 EXITING MPI
  
[Thread 0x7ffff5ad0700 (LWP 22139) exited]
[Thread 0x7ffff64c2700 (LWP 22138) exited]
double free or corruption (!prev)

Thread 1 "out_telemac2d" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) 
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 6 months ago #38511

  • jescobar
  • jescobar's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 22
  • Thank you received: 2
Dear Kilian,

Did you find a solution for this? Or do you have to merge your results manually?

I am experiencing the same problem and it indeed happens only when the keyword "SUSPENSION FOR ALL SANDS" is active.

Best,

Sebastian
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 6 months ago #38520

  • KMou
  • KMou's Avatar
Hi Sebastian,

Unfortunately I didn't find a solution but it's good to know that I'm not the only one with this kind of problem.
Do you also use Telemac2d and Gaia?

Kind regards,

Kilian
The administrator has disabled public write access.

Problems after exiting MPI / Simulation 3 years 5 months ago #38658

  • Arnaud_Vallée
  • Arnaud_Vallée's Avatar
Hello,

Do you have some sources in your model?

I had the same error, and when there are sources in the domain you have to use the following keyword: SUSPENDED SEDIMENTS CONCENTRATION VALUES AT THE SOURCES even if you put zero at every sources.

Arnaud
The administrator has disabled public write access.
Moderators: Pablo, pavans

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.