Welcome, Guest
Username: Password: Remember me

TOPIC: Message error

Message error 7 years 6 months ago #26307

  • jonathan
  • jonathan's Avatar
Hello all,

My model stop running after 20 days with this error message. Does someone know when the error come from and how can I solve it please?

Regards

================================================================================
ITERATION 353160 TIME: 20 D 10 H 30 MN 0.0000 S ( 1765800.0000 S)
ADVECTION STEP
DIFFUSION-PROPAGATION STEP
EQUNOR (BIEF) : 37 ITERATIONS, RELATIVE PRECISION: 0.9940207E-03
BALANCE OF WATER VOLUME
VOLUME IN THE DOMAIN : 0.5735012E+13 M3
FLUX BOUNDARY 1: -8330333. M3/S ( >0 : ENTERING <0 : EXITING )
FLUX BOUNDARY 2: 1034479. M3/S ( >0 : ENTERING <0 : EXITING )
RELATIVE ERROR IN VOLUME AT T = 0.1766E+07 S : 0.2231251E-05
@STREAMLINE::SCARACT: THE NUMBER OF TRACEBACK INTERFACE CROSSINGS IGEN > 99



PLANTE: PROGRAM STOPPED AFTER AN ERROR
RETURNING EXIT CODE: 2
The administrator has disabled public write access.

Message error 7 years 6 months ago #26330

  • riadh
  • riadh's Avatar
Hello

Your problem is close to the one discussed in this topic.

kind regards

Riadh
The administrator has disabled public write access.

Message error 7 years 6 months ago #26464

  • jonathan
  • jonathan's Avatar
Thank you very much!!! I solved this problem but a new one appeared:

ITERATION 1046880 TIME: 24 D 5 H 36 MN 0.0000 S ( 2093760.0000 S)
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)

runcode::main:
:
|runCode: Fail to run
|mpirun -np 32 out_forcing
|~~~~~~~~~~~~~~~~~~
|[23:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 23
|[13:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 13
|[14:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 14
|[16:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 16
|[20:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 20
|[21:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 21
|[22:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 22
|[15:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 15
|[17:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 17
|~~~~~~~~~~~~~~~~~~


Does someone where it can come from?

jonathan
The administrator has disabled public write access.

Message error 7 years 6 months ago #26465

  • jonathan
  • jonathan's Avatar
I should have probably said that I already run the model for a month (01/01/2015) and it works properly. And then I used this first month as a spin_up file two run an other model for the next month (01/02/2015). ANd this text error appeared:

ITERATION 1046880 TIME: 24 D 5 H 36 MN 0.0000 S ( 2093760.0000 S)
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)

runcode::main:
:
|runCode: Fail to run
|mpirun -np 32 out_forcing
|~~~~~~~~~~~~~~~~~~
|[23:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 23
|[13:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 13
|[14:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 14
|[16:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 16
|[20:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 20
|[21:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 21
|[22:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 22
|[15:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 15
|[17:cwc007] unexpected disconnect completion event from [0:cwc006]
|Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
|internal ABORT - process 17
|~~~~~~~~~~~~~~~~~~
The administrator has disabled public write access.

Message error 7 years 6 months ago #26466

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
Are you running in parallel?
Could it be possible your job has a limited execution time ?
regards
Christophe
The administrator has disabled public write access.

Message error 7 years 6 months ago #26467

  • jonathan
  • jonathan's Avatar
I dont think so cause the model stopped running after 11 hours and the execution time is limited to 50 hours...
The administrator has disabled public write access.

Message error 7 years 6 months ago #26468

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
OK
but the only information we could see is:
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)

This looks like someone or something killed your job...

Regards
Christophe
The administrator has disabled public write access.

Message error 7 years 6 months ago #26476

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Hi jonathan,


As Cristophe pointed out, this happened to me before in Linux (Ubuntu 16.04) and was the kernel terminating jobs that hogged too much memory, I had an underground process taking too much resources and the kernel ended up killing all the 'heavy' ones.

It solved when updated the kernel to a more recent version and keeping the system as clean as I could.

Regards,

José Díaz.
The administrator has disabled public write access.

Message error 7 years 6 months ago #26537

  • jonathan
  • jonathan's Avatar
Thank you all, the problem is solved!!!
The administrator has disabled public write access.
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.